1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 


1992  PPRC 
1992  PPRC 
1992  PPRC 

^^MENT 

REVIEW 
COMMISSION 

CONFERENCE 
ON 
PROFILING 

No.  92-2 

1992  PPRC 
1992  PPRC 
1992  PPRC 


1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 


PHYSICIAN 
PAYMENT 
REVIEW 
COMMISSION 


CONFERENCE 
ON 
PROFILING 


No.  92-2 


2120  L  Street,  NW 
Suite  510 
Washington,  DC  20037 

(202)  653-7220 
FAX  (202)  653-7238 


PREFACE 


Profiling  is  an  analytic  tool  that  provides  a  means  to  compare  practice  patterns  of 
providers  on  the  dimensions  of  cost,  service  use,  or  quality  of  care.  It  is  increasingly  being 
used  in  activities  related  to  utilization  review,  quality  improvement,  assessment  of  provider 
performance,  and  research  into  the  effectiveness  and  appropriateness  of  care.  Its 
expanded  use  is  related  to  the  growing  availability  of  data  on  which  profiling  can  be  based, 
as  well  as  the  spread  and  improvement  of  technology  for  processing  this  information. 

The  ease  with  which  profiling  can  be  done  may  be  outpacing  the  ability  to  use  profiling 
effectively  and  responsibly,  however.  As  profiling  has  become  more  widespread,  diverse 
groups  have  raised  concerns  over  the  quality  of  the  profiles  themselves,  the  uses  to  which 
they  have  been  put,  and  access  to  the  information  contained  in  them. 

In  January  1992,  the  Physician  Payment  Review  Commission  held  a  conference  on 
profiling  to  learn  what  is  known  about  the  appropriateness  of  its  present  uses  and  to 
identify  what  will  be  required  to  realize  the  full  potential  of  this  technique.  The  first  part 
of  the  conference  comprised  presentations  of  four  commissioned  papers.  This  was 
followed  by  a  panel  discussion  by  representatives  of  the  major  constituencies  affected  by 
profiling  (public  and  private  payers,  physicians,  PROs,  consumers,  and  researchers) 
focusing  on  the  expectations  and  concerns  of  affected  parties. 

The  Commission's  Annual  Report  to  Congress  1992  contains  a  chapter  based  in  part  on  the 
content  of  this  conference.  The  chapter  -  included  as  the  first  paper  in  this  volume  — 
introduces  the  topic  of  profiling,  illustrat^some  of  the  challenges  of  developing  and  using 
profiles  responsibly,  and  concludes  by  articulating  the  Commission's  short-  and  long-term 
goals  for  improving  the  ability  of  profiling  to  meet  the  needs  of  its  users.  The  remainder 
of  this  volume  contains  the  four  commissioned  papers  presented  at  the  conference. 
Donald  Brand,  Lois  Quam,  and  Sheila  Leatherman  discuss  the  data  needs  of  profiling 
systems,  and  Barbara  McNeil,  Sarah  Pedersen,  and  Constantine  Gatsonis  analyze  the 
potential  and  limitations  of  current  profiling  efforts.  Steven  Schoenbaum  and  Katherine 
Murrey  describe  the  impact  of  profiles  on  medical  practice,  including  an  appendix  that 
compiles  and  summarizes  the  results  of  studies  of  the  effectiveness  of  feedback  of  profiling 
information,  and  Emily  Friedman  concludes  with  a  consumer  perspective  on  access  to 
profiling  information. 


ii 


The  conference  greatly  advanced  the  Commission's  understanding  of  the  issues  concerning 
profiling  and  how  to  facilitate  its  use  in  the  future.  In  the  coming  year,  the  Commission 
plans  to  focus  its  work  on  data  systems  and  infrastructure  to  support  profiling,  and  it 
expects  to  issue  recommendations  to  Congress  in  its  1993  annual  report. 

The  Commission  greatly  appreciates  the  efforts  of  all  who  contributed  to  the  success  of 
the  conference.  The  Commission  especially  wants  to  thank  the  conference  faculty 
(Howard  Bailit,  Robert  Berenson,  James  Cannon,  Emily  Friedman,  Karen  Ignagni,  Robert 
Keller,  Sheila  Leatherman,  Philip  Lee,  Barbara  McNeil,  Kathleen  Means,  Stephen 
Schoenbaum,  Steven  Schroeder,  Albert  Siu),  the  coauthors  of  the  papers  (Donald  Brand, 
Constantine  Gatsonis,  Katherine  Murrey,  Sarah  Pedersen,  Lois  Quam),  the  post- 
conference  discussants  (John  Billings,  Peter  Budetti,  Philip  Caper,  Janet  Corrigan,  Carol 
Cronin,  Judith  Feder,  Judith  Miller  Jones,  Stanley  Jones,  Kathryn  Langwell,  John  Iglehart, 
Walter  McNerney,  Nigel  Roberts,  Richard  Sharpe),  and  those  who  provided  information 
to  the  paper  writers. 


Philip  R.  Lee,  M.D. 

Chairman 


iii 


CONTENTS 


Preface   ii 

Paper  No. 

1  Realizing  the  Potential  of  Profiling 

by  Roz  Diane  Lasker,  David  W.  Shapiro, 

and  Anthony  M.  Tucker   1 

2  Data  Needs  of  Profiling  Systems 

by  Donald  A.  Brand,  Lois  Quam,  and  Sheila  Leathermany   20 

3  Current  Issues  in  Profiles:  Potentials  and  Limitations ) 

by  Barbara  J.  McNeil,  Sarah  H.  Pedersen, 

and  Constantine  Gatsonis  ?   46 

4  Impact  of  Profiles  on  Medical  Practice 

by  Stephen  C.  Schoenbaum  and  Katherine  Oates  Murrey  *  A?  J?   71 

Appendix  4A   103 

5  Public  Access  to  Profiling  Information 

by  Emily  Friedman  *  126 

iv 


PAPER  NO.  1 


REALIZING  THE  POTENTIAL  OF  PROFILING 


Authors: 

Roz  Diane  Lasker,  M.D. 
David  W.  Shapiro,  M.D.,  J.D. 
Anthony  M.  Tucker,  M.P.A. 


Address: 


Physician  Payment  Review  Commission 
2120  L  Street,  N.W. 
Suite  500 
Washington,  DC  20037 


REALIZING  THE  POTENTIAL  OF  PROFILING 


The  American  health  care  system  is  facing  serious  challenges  concerning  cost  and  quality. 
Rapidly  rising  health  expenditures  are  predicted  to  account  for  nearly  14  percent  of  the 
gross  national  product  in  1992.  The  patterns  and  organization  of  medical  practice  vary 
widely  across  the  country,  but  there  is  uncertainty  as  to  what  approaches  work  best  and 
how  care  can  be  provided  most  efficiently.  New  and  increasingly  complex  and  expensive 
methods  of  diagnosis  and  treatment  are  continuously  being  introduced.  At  the  same  time, 
there  is  growing  concern  that  a  substantial  proportion  of  patients  may  be  receiving  care 
that  is  of  little  or  no  benefit  to  them,  while  others  are  not  receiving  care  that  is  known  to 
be  effective. 

Recent  responses  to  these  challenges  have  linked  efforts  aimed  at  obtaining  and 
disseminating  information  about  appropriate,  cost-effective  approaches  to  care  with 
incentives  and  systems  that  will  encourage  providers  to  put  such  information  into  practice. 
The  federal  government  has  made  a  strong  commitment  to  funding  medical  effectiveness 
and  outcomes  research,  and  parties  in  both  the  public  and  private  sectors  have  become 
actively  involved  in  the  development  of  practice  guidelines.  Financial  incentives  to 
hospitals,  groups  of  physicians,  and  individual  physicians  have  changed  through  the 
implementation  of  diagnosis-related  groups,  Volume  Performance  Standards,  the  resource- 
based  Medicare  Fee  Schedule,  and  various  forms  of  capitated  practice  arrangements. 
Attempts  to  manage  health  care  have  been  growing,  such  as  increasingly  rigorous 
preauthorization  and  utilization  review,  policies  limiting  patients'  choices  of  providers,  and 
quality  management  techniques  designed  to  identify  problems  and  improve  performance. 

Many  see  an  important  role  for  profiling  in  these  diverse  efforts.  Profiling  is  an 
epidemiologic  technique  that  focuses  the  assessment  of  health  care  delivery  on  patterns 
of  care  rather  than  on  individual  occurrences  of  care.  Information  obtained  from  large 
databases  is  used  to  identify  a  provider's  pattern  of  practice  and  compare  it  with  those  of 
similar  providers  or  with  an  accepted  standard  of  care.  Theoretically  at  least,  profiling  can 
be  an  effective  and  relatively  nonintrusive  way  to  identify  over-  and  underutilization  of 
services,  to  uncover  problems  with  the  efficiency  and  quality  of  care,  and  to  assess  provider 
performance.  This  makes  it  an  attractive  and  potentially  useful  tool  for  health  care 
professionals,  patients,  payers,  medical  educators,  and  policymakers. 

The  expanding  use  of  profiling  is  related  not  only  to  its  broad  applicability,  but  also  to  the 
growing  availability  of  data  on  which  it  can  be  based,  such  as  claims  databases,  and  the 
spread  and  improvement  of  technology  for  processing  this  information.  The  availability 
of  information  may  be  outpacing  the  ability  to  use  profiling  effectively  and  responsibly, 
however.  As  profiling  has  become  more  widespread,  diverse  groups  have  raised  concerns 


2 


over  the  quality  of  the  profiles  themselves,  the  uses  to  which  they  have  been  put,  and 
access  to  the  information  contained  in  them. 

In  January  1992,  the  Physician  Payment  Review  Commission  (PPRC)  held  a  conference 
to  learn  about  the  appropriateness  of  present  uses  of  profiling  and  to  identify  what  will  be 
required  to  realize  the  full  potential  of  this  technique  as  it  continues  to  develop  in  the 
future.  At  this  conference,  a  series  of  formal  papers  was  presented  addressing  issues 
related  to  data  needs,  requirements  for  developing  valid  and  relevant  profiles,  the  impact 
of  profiles  on  medical  practice,  and  controversies  surrounding  public  access  to  profiling 
information  (the  papers  are  collected  in  this  volume).  These  issues  were  then  discussed 
by  a  panel  of  representatives  of  the  major  constituencies  affected  by  profiling,  including 
public  and  private  payers,  physicians,  Peer  Review  Organizations  (PROs),  consumers,  and 
researchers. 

This  paper  is  based,  in  part,  on  the  content  of  that  conference.1  It  begins  by  reviewing 
the  basic  concepts  that  underpin  profiling.  The  paper  then  describes  the  roles  that 
profiling  can  potentially  play  in  quality  improvement,  assessment  of  provider  performance, 
and  utilization  review.  It  continues  with  case  studies  illustrating  the  types  of  problems  that 
have  arisen  in  actual  usage,  followed  by  a  discussion  of  the  issues  that  need  to  be 
addressed  to  resolve  them.  The  final  section  discusses  the  roles  that  profiling  can  play  — 
in  the  short-  and  long-term  ~  in  achieving  the  goals  of  health  care  reform. 

WHAT  IS  PROFILING? 

Profiling  is  an  analytic  tool  that  uses  epidemiologic  methods  to  compare  practice  patterns 
of  providers  on  the  dimensions  of  cost,  service  use,  or  quality  (process  and  outcome)  of 
care.  The  provider  being  profiled  can  either  be  an  individual  practitioner,  a  group  of 
practitioners,  or  a  health  care  organization,  such  as  a  hospital  or  health  maintenance 
organization  (HMO).2  The  provider's  pattern  of  practice  is  expressed  as  a  rate  --  some 
measure  of  utilization  (costs  or  services)  or  outcome  (functional  status,  morbidity,  or 
mortality)  aggregated  over  time  for  a  defined  population  of  patients  under  the  provider's 
care.  The  number  of  sigmoidoscopy  claims  an  internist  submits  to  the  Medicare  program 
per  100  Medicare  patients  he  or  she  sees  per  year  is  an  example  of  a  profiling  rate. 

Comparisons  in  profiling  are  made  by  relating  the  utilization  or  outcome  rate  for  a 
particular  provider  to  a  norm,  which  can  be  either  a  rate  derived  from  the  practice 


1  This  paper  was  originally  published  as  Chapter  9  of  the  Commission's  Annual  Report  to  Congress  1992 
(PPRC  1992). 

2  The  focus  of  profiling  on  the  provider  distinguishes  it  from  technology  assessment  and  outcomes  research, 
in  which  the  independent  variables  are  the  service  and  measure  of  outcome,  respectively. 


3 


patterns  of  other  similar  providers,  called  a  practice-based  norm,  or  a  rate  that  would  be 
expected  if  providers  followed  an  accepted  practice  guideline,  called  a  standards-based 
norm.  Practice-based  norms  do  not  necessarily  reflect  appropriate  care.  For  example,  the 
mean  rate  at  which  internists  currently  perform  sigmoidoscopies  for  Medicare  patients  may 
be  too  high  or  too  low.  Standards-based  norms  reflect  appropriate  care  to  the  extent  that 
they  are  based  on  practice  guidelines  grounded  in  sound  scientific  evidence  (PPRC  1992). 

Profiles  are  usually  designed  to  generate  some  type  of  action,  which  is  taken  if  the 
utilization  or  outcome  rate  for  a  particular  provider  differs  from  the  norm  by  a  certain 
amount.  For  example,  an  internist  might  be  notified  by  the  Medicare  program  if  his  or 
her  Medicare  sigmoidoscopy  rate  is  greater  than  two  standard  deviations  above  the  mean 
for  other  internists  practicing  in  the  same  community.  Alternatively,  if  a  standards-based 
norm  is  available  for  performing  sigmoidoscopy  in  the  Medicare  population,  all  internists 
might  be  notified  of  the  extent  to  which  their  rates  are  appropriate  or  inappropriate  (that 
is,  whether  their  rates  meet  the  norm  or  are  too  high  or  too  low  in  relation  to  the  norm). 

By  focusing  on  aggregate  patterns  of  practice  (rather  than  on  patterns  of  individual 
practitioners),  profiling  can  be  used  to  compare  the  medical  care  provided  by  different 
organizations  or  received  by  different  populations  of  patients.  For  example,  this  type  of 
profiling  can  compare  the  rate  with  which  outpatient  sigmoidoscopies  are  performed  in 
different  geographic  regions  of  the  country,  the  rate  with  which  cesarean  sections  are 
performed  in  different  HMOs,  or  the  mortality  rate  following  coronary  artery  bypass 
surgery  in  different  hospitals. 

WHAT  ARE  THE  POTENTIAL  USES  OF  PROFILING? 

The  exciting  potential  of  profiling  relates  to  the  way  it  can  theoretically  be  applied  in  three 
important  areas:  quality  improvement,  assessment  of  provider  performance,  and  utilization 
review. 

Application  of  Profiling  to  Quality  Improvement 

Quality  improvement  is  geared  toward  identifying  problems  and  overcoming  them  through 
changes  in  performance.  In  health  care,  problems  are  often  indicated  by  poor  or  variable 
patient  outcomes.  Methods  for  improving  outcomes  may  or  may  not  be  apparent, 
however,  and  may  ultimately  require  changes  on  the  part  of  physicians,  nonphysician 
practitioners,  patients,  or  health  care  systems. 

Profiling  can  theoretically  play  a  role  in  several  aspects  of  quality  improvement.  First,  it 
can  be  used  to  target  potential  problem  areas  by  identifying  conditions  or  procedures 
where  there  are  large  variations  in  outcome.  Such  profiles  would  measure  differences  in 


4 


the  frequency  with  which  specific  outcomes  occur  when  patients  with  a  particular  condition 
or  patients  undergoing  a  particular  procedure  are  cared  for  by  different  providers. 

It  can  also  be  helpful  in  determining  how  and  by  whom  performance  should  be  changed 
to  improve  outcome.  Through  feedback  and  discussion  of  the  results  of  an  outcomes 
profile,  for  example,  physicians  may  be  able  to  identify  specific  differences  in  the  process 
of  care  (not  necessarily  attributable  to  them)  that  are  likely  to  underlie  the  differences  in 
outcome.  Such  hypotheses  can  then  be  tested  by  running  profiles  on  both  the  process  and 
*  outcome  of  care,  looking  for  correlations  between  the  two,  or  by  modifying  the  process  of 
care  across  providers  to  see  if,  on  reprofiling,  differences  in  outcome  are  reduced  or 
overall  outcome  is  improved. 

Application  of  Profiling  to  Assessment  of  Provider  Performance 

Valid  methods  for  assessing  provider  performance  are  important  to  many  parties  for  a 
wide  variety  of  purposes.  For  example,  they  can  provide  a  basis  for  making  decisions 
about  certification,  credentialing,  and  granting  of  hospital  privileges;  monitoring 
compliance  with  practice  guidelines  or  quality  improvement  measures;  or  choosing 
physicians,  hospitals,  and  health  plans  by  payers  and  perhaps  by  consumers  as  well.  They 
can  also  be  used  to  assess  the  effectiveness  of  medical  training  and  to  identify  areas  that 
may  require  more  effective  undergraduate,  graduate,  or  continuing  medical  education. 

Profiling  potentially  can  play  an  important  role  in  this  regard  since  it  can  be  used  to 
identify  providers  who  do  and  do  not  meet  a  certain  standard  of  care.  For  example,  for 
a  particular  condition  or  procedure,  it  may  be  possible  to  identify  a  specific  process 
measure  that  is  associated  with  appropriate,  cost-effective  approaches  to  care,  such  as  a 
service  that  is  an  integral  part  of  an  accepted  practice  guideline.  A  particular  provider's 
utilization  rate  for  this  service  could  then  be  compared  with  the  rate  that  would  be 
expected  for  providers  who  followed  the  practice  guideline.  Performance  would  be 
"acceptable"  if  the  provider's  rate  does  not  deviate  substantially  from  this  norm. 

Profiling  is  an  inherently  flexible  technique  for  assessing  provider  performance. 
Depending  on  the  purpose,  the  standard  of  care  can  be  based  on  either  practice  guidelines 
(i.e.,  standards-based  norms)  or  actual  provider  practice  patterns  (i.e.,  practice-based 
norms).  Acceptable  performance  can  be  defined  as  a  small  or  large  deviation  from  the 
norm.  The  assessment  can  be  focused  (i.e.,  based  on  a  profile  targeted  to  one  condition 
or  procedure)  or  broad  (i.e.,  based  on  the  results  of  multiple  profiles  covering  a  wide  range 
of  conditions  or  procedures).  Finally,  the  provider  being  assessed  can  be  an  individual 
practitioner  or  a  health  care  organization. 


5 


Application  of  Profiling  to  Utilization  Review 

Theoretically,  profiling  can  play  a  role  in  utilization  review  by  identifying  outliers  for  more 
detailed  review  or  by  encouraging  more  appropriate  delivery  of  services  by  all  providers. 
Both  approaches  could  make  utilization  review  more  efficient,  exempting  most  physicians 
from  intrusive  case-by-case  review. 

In  the  outlier  approach,  which  is  the  usual  way  of  thinking  about  profiling  in  this  context, 
profiles  are  used  to  compare  the  rates  with  which  individual  physicians  use  particular 
services  with  rates  in  similar  practices  or  with  target  rates  based  on  practice  guidelines. 
Outliers  are  defined  as  physicians  whose  rates  deviate  substantially  from  the  norm. 
Detailed  case-based  review  can  then  be  targeted  to  physicians  whose  practices  suggest  a 
greater  likelihood  of  over-  or  underutilization. 

Profiling  also  has  the  potential  to  improve  the  frequency  with  which  services  are  provided 
by  the  "average"  physician.  For  example,  in  cases  where  a  standards-based  norm  is 
available  for  a  particular  service,  all  profiled  physicians  could  be  notified  of  the  extent  to 
which  their  utilization  rates  differ  from  the  norm.  Practice  guidelines  could  be  distributed 
at  the  same  time  to  inform  physicians  how  they  might  change  their  behavior  so  as  to  use 
the  service  more  appropriately.  In  cases  where  standards-based  norms  and  practice 
guidelines  are  not  available,  it  may  be  possible  to  bring  physicians  in  a  community  together 
to  review  the  profiling  results.  In  some  cases,  they  may  be  able  to  identify  a  utilization  rate 
—  or  range  of  rates  -  they  believe  is  appropriate  and  to  suggest  ways  that  physicians  could 
achieve  this  rate  without  compromising  patient  outcome. 

CASE  STUDIES  ILLUSTRATING  CURRENT  ISSUES  IN  PROFILING 

Currently,  many  organizations  are  using  profiling  for  utilization  review  and  quality 
improvement,  and  there  is  growing  interest  in  using  profiling  to  assess  provider 
performance.  These  applications  have  not  always  been  straightforward,  however. 
Concerns  raised  by  various  groups  suggest  that  a  number  of  issues  will  need  to  be 
addressed  before  the  full  potential  of  profiling  can  be  delineated  and  realized.  The 
following  case  studies  illustrate  some  of  the  problems  that  have  arisen  and  ~  especially  in 
Case  Study  2  —  suggest  ways  in  which  some  of  the  limitations  of  current  profiles  may  be 
successfully  overcome. 

Case  Study  1:  Release  of  Mortality  Data 

Imagine  that  you  are  a  hospital  administrator  and  that  your  state  health  department  has 
just  released  to  the  public  coronary  artery  bypass  graft  (CABG)  mortality  rates  for  all  of 
the  hospitals  in  the  state  that  perform  this  procedure.  Your  hospital's  ranking  is  low.  The 


6 


board  of  trustees  calls  you  in  saying,  "This  looks  bad  for  us;  do  something  to  make  us  look 
better." 

You  bring  together  all  the  cardiac  surgeons  who  perform  CABGs  in  your  hospital.  They 
ask  a  lot  of  questions  about  the  profile  on  which  the  ranking  is  based,  unconvinced  that 
it  reflects  true  differences  in  the  quality  of  care.  Among  their  questions  and  concerns  are 
the  following. 

•  What  databases  were  used  in  the  analysis?  How  were  hospital  data  linked  to 
mortality  data?  How  accurate  and  reliable  is  the  information  in  the  linked 
database? 

•  How  many  CABG  operations  were  performed  at  each  hospital  per  year?  How 
likely  is  it  that  the  differences  across  hospitals  are  due  to  chance  alone? 

•  Does,  the  profile  account  for  differences  in  the  mix  of  CABG  operations  each 
hospital  performs?  (The  surgeons  would  expect  higher  mortality  rates  for 
hospitals  performing  higher  proportions  of  emergency  CABG,  CABG  following 
angioplasty,  repeat  CABG,  or  multiple  vessel  CABG). 

•  Does  the  profile  account  for  differences  in  the  patient  population  undergoing 
CABG  in  each  hospital  that  could  account  for  differences  in  mortality? 

•  What  time  frame  was  used  in  the  analysis?  How  likely  is  it  that  death  during 
that  period  was  related  to  the  CABG  rather  than  other  causes? 

•  Does  the  ranking  compare  similar  hospitals  (e.g.,  tertiary  care  teaching 
hospitals)  or  all  hospitals  in  the  state? 

After  obtaining  answers  to  these  questions  (to  the  extent  possible),  you  meet  with  the 
surgeons  again.  They  do  not  believe  that  the  ranking  provides  them  or  the  public  with 
much  useful  information.  The  differences  in  mortality  rates  between  closely  ranked 
hospitals  do  not  appear  to  be  statistically  significant.  The  design  of  the  profiles  accounts 
for  some  but  not  all  of  the  important  differences  in  case  mix.  More  important,  the 
surgeons  do  not  know  what  CABG  mortality  rate  would  be  appropriate  in  their  hospital 
or  what  rate  they  should  aim  for,  given  the  types  of  patients  they  see  and  the  mix  of 
CABG  operations  they  perform. 

You  share  with  them  the  call  for  action  from  the  board  of  trustees.  Some  surgeons  suggest 
that  the  quickest  ways  to  improve  the  ranking  would  be  for  the  hospital  to  stop  accepting 
all  high-risk  patients  for  CABG  surgery,  stop  performing  all  high-risk  CABG  procedures, 
and  try  to  keep  all  post-CABG  patients  alive  in  the  intensive  care  unit  for  the  profile  time 


7 


frame,  regardless  of  cost  or  prognosis.  However,  the  surgeons  do  not  believe  that  these 
approaches  would  do  much,  if  anything,  to  improve  the  quality  of  CABG  care. 

You  mention  that  the  state  has  recently  provided  the  hospital  with  CABG  mortality  rates 
for  individual  surgeons  and  that  the  board  has  asked  you  to  consider  discontinuing  CABG 
privileges  for  surgeons  with  the  highest  rates.  The  surgeons  respond  that,  because  each 
surgeon  in  the  hospital  performs  relatively  few  CABG  operations  per  year  and  because 
each  surgeon  performs  a  different  mix  of  CABG  operations,  the  ranking  of  individual 
surgeons  is  likely  to  be  even  less  reliable  than  that  for  hospitals.  Even  if  it  were  not,  they 
believe  the  approach  would  be  shortsighted.  The  surgeons  who  would  lose  their  privileges 
at  the  hospital  would  probabiy  obtain  them  elsewhere,  and  the  surgeons  who  retained  their 
privileges  would  have  little  motivation  to  improve  their  care.  You  also  mention  that  a 
local  newspaper  is  suing  the  state  for  public  release  of  surgeon-specific  mortality  rates. 
The  surgeons  wonder  how  the  public  will  use  this  information  —  whether  it  will  actually 
protect  patients  from  incompetent  surgeons  or  unnecessarily  undermine  the  trust  patients 
have  in  competent  surgeons. 

At  the  end  of  the  meeting,  the  surgeons  acknowledge  that  the  profiling  results  may  actually 
indicate  true  differences  in  CABG  mortality  rates  across  the  state.  Although  they  would 
like  to  know  how  CABG  care  could  be  improved  in  the  hospital,  they  are  frustrated 
because  the  profile  does  not  identify  where  quality  improvement  efforts  should  be  focused 
—  for  example,  at  anesthesiologists,  surgeons,  other  members  of  the  operating  room  team, 
the  recovery  room,  or  the  intensive  care  unit  ~  or  how  the  process  of  care  could  be 
changed  to  improve  patient  outcome. 

Case  Study  2:  Small-Area  Analysis 

Now  imagine  that  you  are  a  health  services  researcher  working  for  the  health  department 
in  a  small,  predominantly  rural,  state.  You  have  recently  completed  a  profile  of 
discretionary  pediatric  admissions.  The  results  show  substantial  differences  for  a  set  of 
diagnosis-related  groups  in  the  number  of  admissions  per  capita  (age-  and  sex-adjusted) 
per  year  across  the  state's  hospital  service  areas.  There  is  no  evidence  that  these  different 
admission  rates  are  associated  with  any  difference  in  outcome. 

You  ask  a  representative  group  of  the  state's  pediatricians  to  review  this  information. 
They  acknowledge  that  they  do  not  know  what  the  most  appropriate  admission  rate  is,  but, 
given  the  outcome  data,  they  believe  that  it  is  probably  near  the  low  end  for  the  state. 

You  do  more  focused  profiling  in  the  medium-  and  high-rate  areas  to  identify  the 
proportion  of  patients  each  hospital  and  pediatrician  is  admitting.  Then  you  convene  all 
of  the  pediatricians  in  each  hospital  service  area  to  review  the  results.  This  information 
is  not  made  available  to  the  public.  Although  the  admission  rates  in  the  profile  are  not 
adjusted  to  account  for  differences  in  patient  population  characteristics  (number  and  case 


8 


mix)  across  providers,  the  pediatricians  think  the  information  is  still  useful  since  they  are 
quite  familiar  with  each  others'  practices.  In  most  areas,  possible  causes  for  the  higher 
rates  are  identified  in  the  course  of  the  discussion.  For  example,  in  one  area  most  patients 
were  admitted  by  two  pediatricians  who  had  recently  set  up  practice.  Reprofiling  one  year 
later  shows  that  differences  across  hospital  service  areas,  as  well  as  the  average  admission 
rate  for  the  state,  have  markedly  decreased. 

Case  Study  3:  Profiling  of  Physician  Claims 

Imagine  that  you  are  in  charge  of  utilization  review  activities  for  a  large  insurer.  You 
think  it  is  likely  that  outpatient  chest  X-rays  are  provided  more  often  than  necessary  and 
want  to  profile  physicians  on  their  use  of  this  procedure.  Using  your  administrative  claims 
file,  you  are  able  to  compare  the  rate  at  which  physicians  in  a  particular  specialty  and  type 
of  practice  bill  for  CPT  code  71020  (radiologic  examination  of  the  chest,  frontal  and 
lateral)  per  100  patients  insured  under  your  program  per  year. 

Not  surprisingly,  you  note  wide  variation  in  chest  X-ray  rates.  You  inform  physicians 
whose  rate  is  higher  than  95  percent  of  the  other  physicians  in  his  or  her  peer  group  that 
they  will  be  reviewed  if  their  rate  remains  at  this  level  for  three  consecutive  years.  Other 
physicians  do  not  receive  reports  of  their  chest  X-ray  rates. 

You  receive  written  responses  from  most  of  the  outlier  physicians.  They  question  the 
validity  of  your  results,  offering  the  following  reasons. 

•  Their  identification  number  represents  a  group  rather  than  an  individual 
physician. 

•  Their  specialty  designation  is  wrong  or  they  are  a  member  of  a  specialty  that 
your  insurer  does  not  distinctly  recognize. 

•  They  provide  chest  X-rays  in  their  office  whereas  most  of  the  other  physicians 
in  their  peer  group  refer  patients  who  need  chest  X-rays  to  a  radiologist. 

•  They  see  more  patients  with  cardiac  or  pulmonary  conditions  than  other 
physicians  in  their  peer  group. 

One  physician,  who  is  the  medical  director  of  a  preferred  provider  organization,  questions 
whether  your  efforts  will  do  much  to  accomplish  your  goal  of  encouraging  more 
appropriate,  cost-effective  care.  The  medical  director  agrees  that  chest  X-rays  are 
probably  overutilized  and  that  this  is  probably  the  case  for  most,  if  not  all,  physicians. 
However,  this  physician  notes  that  your  approach  is  targeted  at  only  5  percent  of 
physicians  and  that  most  of  these  physicians  believe  there  is  a  good  reason  explaining  his 
or  her  deviation  from  the  norm.  Even  if  the  surveillance  causes  some  or  most  of  these 


9 


physicians  to  change  their  practice  patterns,  the  remaining  95  percent  have  no  incentive 
to  change  since  they  have  not  been  provided  with  any  feedback.  More  important,  the 
appropriate  chest  X-ray  rate  is  not  known,  and  guidelines  are  not  available  informing 
physicians  which  of  the  chest  X-rays  they  are  currently  ordering  or  performing  are  not 
needed. 


DEVELOPING  AND  USING  PROFILES  TO  IMPROVE  MEDICAL  PRACTICE 

The  cases  described  above  highlight  some  of  the  issues  that  must  be  resolved  before  the 
full  potential  of  profiling  can  be  realized.  This  section  explores  what  is  required  to  develop 
valid  and  relevant  profiles  for  various  purposes.  It  focuses  first  on  the  data  that  underpin 
profiling,  considering  how  well  currently  available  databases  satisfy  profiling's  needs.  It 
then  focuses  on  characteristics  of  profiles  and  systems  of  feedback  that  encourage 
providers  to  "buy  in"  and  that  enable  them  to  take  effective  action. 

Obtaining  Data  to  Develop  Valid  and  Relevant  Profiles 

Profiling  requires  three  types  of  information:  (1)  clinically  meaningful  indicators  of  the 
process  or  outcome  of  care;  (2)  information  to  identify  similar  providers  or  to  account  for 
important  differences  across  providers,  such  as  differences  in  patient  case  mix;  and  (3) 
data  that  can  be  used  to  set  utilization  and  outcome  rates  reflecting  appropriate,  cost- 
effective  care.  The  ability  to  develop  valid  and  relevant  profiles  therefore  depends  on  the 
availability  of  accurate  and  reliable  sources  of  these  types  of  information.  Databases  now 
used  for  profiling  may  not  be  ideal  in  this  regard  since  they  were  developed  for  other 
purposes. 

Currently,  most  profiling  is  based  on  data  from  large  administrative  databases,  such  as 
claims  files  and  medical  records  (primarily  hospital  discharge  data  and  inpatient  records). 
These  databases  are  good  sources  of  information  about  costs,  billed  services,  and  certain 
types  of  patient  outcome,  such  as  mortality,  complications,  and  readmission  to  the  hospital. 
Consequently,  it  is  not  surprising  that  they  are  most  often  used  to  generate  profiles  for 
utilization  review  and  to  examine  morbidity  and  mortality. 

Although  available  databases  provide  useful  information  for  some  types  of  profiling, 
problems  related  to  making  comparability  adjustments  and  obtaining  adequate 
observations  make  some  of  these  profiles  difficult  to  interpret.  Available  databases  are 
not  well-suited  for  supporting  other  types  of  profiling,  particularly  in  the  areas  of  quality 
improvement  and  assessment  of  provider  performance.  Many  of  the  data  that  are  most 
pertinent  for  these  purposes  ~  relevant  indicators  of  the  process  and  outcome  of  care  and 
data  that  can  be  used  to  develop  standards-based  norms  ~  are  either  difficult  to  obtain  or 
cannot  be  obtained  accurately  or  reliably  through  current  sources.  These  data  can  be 


10 


generated,  however,  through  condition-specific  encounter  forms  and  other  specialized 
instruments. 

Comparability  Adjustments.  Since  provider  and  patient  factors  can  have  a  substantial 
effect  on  utilization  and  outcome  rates,  profile  results  are  most  meaningful  if  they  compare 
the  practice  patterns  of  similar  types  of  providers  who  care  for  similar  patient  populations. 
To  make  these  comparability  adjustments,  data  are  required  that  can  be  used  to 
characterize  ~  accurately  and  reliably  ~  relevant  aspects  of  both  the  provider  and  the 
patient. 

As  suggested  earlier  in  this  paper  (Case  Study  3),  comparisons  of  utilization  rates  are 
more  meaningful  when  it  is  possible  to  control  for  physician  specialty  and  patient 
condition/diagnosis,  and  to  distinguish  the  referring  or  ordering  physician  from  the 
physician  who  actually  provides  each  service.  Medicare  claims  data  currently  include  these 
data  elements,  but  the  codes  that  are  used  to  describe  physician  specialty  and  patient 
condition/diagnosis  are  not  interpreted  or  used  uniformly.  Administrative  databases  of 
other  payers  frequently  do  not  include  all  of  this  information.  Improvements  in  coding  and 
modifications  in  claim  forms  that  would  facilitate  making  these  comparability  adjustments 
would  not  be  particularly  difficult  to  achieve.  Yet  they  would  make  claims  data  much 
more  supportive  of  utilization  review  profiling. 

Interpretation  of  other  types  of  profiles  may  require  more  sophisticated  comparability 
adjustments  than  is  necessary  for  utilization  review  (McNeil  et  al.  1992).  For  example, 
comparisons  of  morbidity  and  mortality  rates  (Case  Study  1)  are  challenging  because  many 
provider  and  patient  variables  —  some  known  and  some  unknown  ~  contribute  to 
complications  following  a  particular  procedure.  Known  variables  can  often  be  abstracted 
from  the  hospital  record.  But  this  is  an  expensive  process,  which  may  be  unreliable  if 
clinicians  do  not  always  document  pertinent  findings  or  if  there  are  errors  in  abstraction. 

Number  of  Observations.  Unless  a  service  is  provided  frequently  or  an  outcome  occurs 
often,  the  number  of  observations  in  a  profile  may  be  insufficient  to  ensure  that 
differences  in  rates  across  providers  are  not  due  to  chance  alone.  Although  currently 
available  databases  contain  information  on  thousands,  in  some  cases  millions,  of  patients, 
the  frequency  with  which  an  individual  physician  performs  a  procedure  over  the  course  of 
a  year  or  the  frequency  with  which  an  adverse  outcome  occurs  following  this  procedure 
may  not  be  high  enough  to  detect  statistically  significant  differences.  To  some  extent,  this 
reflects  the  fact  that  each  claims  file  covers  only  a  portion  of  a  provider's  practice. 
Comprehensive  databases  covering  all  patients  that  a  provider  cares  for  are  more  likely 
to  provide  an  adequate  number  of  observations.  This  could  be  achieved  by  linking  data 
sets  across  payers.  But  even  so,  observations  for  a  particular  service  or  condition  may  be 
insufficient,  requiring  profilers  to  aggregate  utilization  or  outcome  rates  across  groups  of 
providers  rather  than  to  profile  rates  of  individual  physicians. 


11 


Relevant  Indicators  of  Process  and  Outcome.  Profiling  for  assessment  of  provider 
performance  and  for  quality  improvement  both  depend  on  having  data  about  clinically 
meaningful  aspects  of  the  process  and  outcome  of  care.  If  the  purpose  of  a  profile  is  to 
monitor  compliance  with  a  practice  guideline,  the  profile  should  measure  the  utilization 
rate  for  a  service  that  is  an  integral  part  of  the  guideline's  process  of  care.  For  a  diabetes 
guideline,  for  example,  it  would  be  meaningful  to  follow  the  rate  at  which  providers  order 
glycosylated  hemoglobin  tests  (a  measure  of  overall  blood  sugar  control  that  is  useful  in 
making  decisions  about  therapy)  but  not  the  rate  at  which  they  order  random  blood  sugars 
(which  are  generally  not  helpful  in  making  management  decisions).  To  be  most  helpful 
in  improving  the  quality  of  care,  a  profile  should  measure  aspects  of  the  process  of  care 
that  are  closely  correlated  with  patient  outcomes  and  short-  and  long-term  outcomes  that 
are  substantially  influenced  by  the  process  of  care. 

Available  databases  are  not  particularly  good  sources  for  this  type  of  information.  Claims 
files  do  not  contain  information  about  many  pertinent  aspects  of  the  process  of  care,  such 
as  medications  and  preventive  services,  both  of  which  are  often  uncovered;  postoperative 
visits,  which  are  included  in  global  fees;  aspects  of  the  history  and  physical  examination, 
which  are  included  in  visits;  and  laboratory  and  imaging  results.  Many  of  the  most 
meaningful  patient  outcomes,  such  as  functional  status  or  condition-specific  indicators  of 
morbidity,  are  also  poorly  documented.  Medical  records  should  be  an  excellent  source  of 
relevant  information  for  profiling.  But  some  of  these  data  may  be  unreliable  since 
physicians  may  fail  to  document  all  of  their  findings  or  all  of  the  services  they  provide. 
Moreover,  since  data  must  be  abstracted,  obtaining  information  from  medical  records  is 
often  costly. 

Other  types  of  databases  have  been  developed  to  provide  more  targeted  information  for 
profiling  (Brand  et  al.  1992).  Patient  satisfaction  instruments  have  been  used  to  assess 
functional  status,  both  generically  and  in  relation  to  particular  chronic  conditions.  Special 
purpose  instruments  have  been  extremely  useful  in  obtaining  information  about  the  process 
and  outcome  of  care  for  particular  conditions.  For  example,  a  condition-specific  encounter 
form  developed  to  measure  compliance  with  a  practice  guideline  covering  emergency 
department  management  of  soft  tissue  injuries  obtained  information  about  elapsed  time 
since  the  injury,  tetanus  immunization  history,  location  and  history  of  the  injury, 
characteristics  of  the  injury,  and  evidence  of  deep  structure  involvement  (Brand  et  al. 
1983).  This  information  was  available  in  only  4  percent  of  abstracted  medical  records. 
Work  is  also  currently  underway  to  obtain  information  from  ambulatory  medical  records 
and  to  develop  and  implement  computer-based  medical  records. 

Appropriate  Utilization  and  Outcome  Rates.  Meaningful  profiles  for  assessing  provider 
performance  require  utilization  and  outcome  rates  reflecting  appropriate,  cost-effective 
care.  These  rates  are  used  as  standards-based  norms  for  determining  whether  a  provider's 
performance  meets  an  accepted  standard.  Such  rates  cannot  be  derived  from  current 
practice  patterns  of  providers,  since  practice-based  norms  do  not  necessarily  reflect 


12 


appropriate  care.  These  rates  should  be  available  to  be  generated  in  the  future  by 
profiling  providers  who  care  for  patients  with  particular  conditions  according  to  valid 
practice  guidelines  (see  PPRC  1992). 

Enabling  Providers  to  Take  Effective  Action 

The  ultimate  objective  of  profiling  is  to  change  medical  practice  for  the  better.  Although 
payers,  consumers,  and  credentialing  bodies  can  take  action  on  profiling  results  —  using 
them  to  choose  "  good  providers"  and  to  sanction  "bad  apples"  ~  this  alone  will  probably 
do  little  to  make  American  health  care  more  appropriate  and  cost-effective  overall.  To 
play  a  role  in  achieving  that  goal,  profilers  will  need  to  focus  on  providers  as  the  agents 
of  change,  developing  profiles  and  systems  that  will  stimulate  them  to  review  and  improve 
their  patterns  of  practice. 

Developing  Actionable  Profiles.  Providers  will  probably  be  reluctant  to  change  their 
behavior  unless-  a  particular  profile  convinces  them  that  action  is  warranted.  For  profiles 
using  practice-based  norms,  this  means  they  need  to  believe  that  their  practice  pattern  is 
actually  different  (or  is  likely  to  be  different)  from  that  of  comparable  providers  caring  for 
comparable  patients  in  comparable  practice  settings.  For  profiles  using  standards-based 
norms,  providers  need  to  believe  that  the  practice  guideline  on  which  the  norm  is  based 
is  valid  and  applies  to  their  patients  and  practice  setting  (see  PPRC  1992). 

Change  also  requires  that  a  profile  be  actionable.  In  other  words,  the  provider  needs  to 
be  able  to  identify  a  specific  change  in  the  process  of  care  that  would  be  likely  to  bring  his 
or  her  practice  pattern  closer  to  the  norm  without  compromising,  or  while  improving, 
patient  outcome.  For  reasons  discussed  earlier  in  Case  Studies  1  and  3,  neither  CABG 
mortality  rates  nor  chest  X-ray  utilization  rates  are  particularly  helpful  in  this  regard. 

These  issues  can  be  addressed,  to  some  extent,  by  improving  the  quality  of  profiles 
themselves.  Where  data  are  available  or  can  be  obtained,  it  may  be  possible  to  develop 
profiles  with  more  sophisticated  comparability  adjustments.  It  may  also  be  possible  to 
develop  more  actionable  profiles  by  honing  in  on  the  specific  patient  population  for  whom 
the  process  or  outcome  of  care  varies  most  across  providers.  For  example,  profiles 
documenting  high  cesarean  section  rates  do  not  dictate  a  corrective  action.  But  if  these 
high  rates  are  associated  with  low  rates  of  vaginal  births  after  prior  cesarean  section,  then 
the  high  cesarean  section  rates  can  probably  be  lowered  by  encouraging  women  who  have 
had  a  prior  cesarean  section  to  undergo  a  trial  of  labor. 

The  Convener  Function.  Effective  change  can  also  be  encouraged  through  proper 
supporting  structures.  Feedback  of  profiling  results  has  been  shown  to  be  successful  in 
improving  provider  performance  (Schoenbaum  and  Murrey  1992).  This  is  particularly  true 
when  the  profiling  entity  acts  as  a  convener,  bringing  providers  together  to  interact  as  a 
group  in  interpreting  profiling  results.  In  situations  where  the  convener  function  has  been 


13 


used,  providers  have  been  able  to  compensate  somewhat  for  the  limitations  of  profiles  and 
to  identify  effective  courses  of  action.  As  described  in  Case  Study  2,  for  example,  they 
have  been  able  to  make  comparability  adjustments  based  on  their  knowledge  of  providers' 
practices,  to  identify  reasonable  target  rates,  and  to  come  up  with  ways  of  achieving  their 
target  rates  without  compromising  patient  outcome.  In  cases  where  the  convener  has  been 
experienced  in  research,  providers  have  been  able  to  identify  aspects  of  the  process  of  care 
that  are  likely  to  be  responsible  for  differences  in  outcome,  leading  to  fruitful  follow-up 
studies  (Keller  et  al.  1990). 

Although  the  convener  function  has  been  used  most  often  in  the  context  of  quality 
improvement,  it  also  has  important  applications  to  utilization  review.  For  many  services, 
inappropriate  utilization  is  likely  to  be  common,  yet  appropriate  utilization  rates  are  not 
known  and  practice  guidelines  identifying  appropriate  and  inappropriate  indications  for 
using  the  service  are  not  yet  available.  In  these  cases,  identifying  outliers  on  the  basis  of 
practice-based  norms  may  overlook  many  providers  who  over-  or  underutilize  the  service. 
Providing  feedback  alone  will  not  give  providers  the  information  they  need  to  take 
corrective  action.  Convening  providers  in  a  community  to  review  the  profiling  results  may, 
in  some  cases,  make  up  for  these  limitations.  Using  the  utilization  rates  as  a  starting  point 
for  discussion,  they  may  be  able  to  identify  reasonable,  if  not  necessarily  appropriate, 
target  rates  and  courses  of  action  to  follow. 

USING  PROFILING  TO  MEET  THE  GOALS  OF  HEALTH  CARE  REFORM 

Profiling  is  clearly  not  the  answer  to  the  nation's  health  care  problems.  Nor  can  its 
potential  in  many  areas  be  realized  at  the  present  time,  primarily  because  of  problems  with 
the  underlying  data.  Yet  profiling  can  probably  play  an  important  role  in  meeting  the 
goals  of  health  care  reform.  It  appears  to  be  a  very  useful  tool  for  uncovering  potential 
problems  with  the  quality  and  efficiency  of  care.  More  important,  when  put  in  the  hands 
of  groups  of  physicians,. profiling  results  can  serve  as  an  effective  stimulus  to  action,  even 
when  the  profiles  themselves  are  imperfect  and  when  the  appropriate  approach  to  care  is 
not  known. 

Short-Term  Goals 

In  the  short-term,  profiling  would  seem  to  have  three  useful  applications,  meeting  the 
needs  of  physicians,  payers,  and  policymakers.  First,  it  can  provide  physicians  with  a  tool 
to  help  them  respond  to  Volume  Performance  Standards  at  a  time  when  few  valid  practice 
guidelines  are  available.  Profiling  can  provide  physicians  with  valuable  information  about 
their  patterns  of  practice,  describing  how  they  differ  —  in  terms  of  utilization  and  outcome 
-  from  the  patterns  of  others.  Although  much  of  this  information  is  currently  limited,  it 
can  nonetheless  be  very  helpful  to  physicians,  especially  when  presented  in  the  context  of 
a  structured  group  process.  In  the  course  of  exploring  reasons  for  variation  in  profiling 


14 


rates,  physicians  may  be  able  to  learn  when  they  are  providing  services  that  are  not 
beneficial  to  patients  and  when  they  are  not  providing  services  that  are.  This  can  help 
them  identify  ways  to  reduce  the  volume  of  certain  services  without  compromising  the 
quality  of  patient  care. 

Second,  profiling  can  help  both  payers  and  physicians  by  improving  the  effectiveness  and 
efficiency  of  utilization  review.  Profiling  can  probably  have  the  greatest  impact  on 
utilization  rates,  as  well  as  minimize  case-by-case  review  to  the  greatest  extent,  if  it  focuses 
-  on  changing  the  behavior  of  the  "average"  physician.  This  will  require  that  payers  use  the 
convener  function,  at  least  until  practice  guidelines  and  appropriate  utilization  rates  are 
available.  Physicians  will  have  better  information  to  act  upon  if  claims  data  are  more 
supportive  of  profiling.  As  described  earlier,  this  could  be  achieved  by  improving  coding, 
adding  certain  elements  to  non-Medicare  claim  forms,  and  linking  databases  across  payers. 

Third,  profiling  can  be  a  useful  tool  for  monitoring  the  effect  of  federal  health  policies  on 
medical  practice.  It  will  be  more  effective  in  this  regard  as  databases  are  more 
comprehensive  and  as  methods  are  developed  to  assess  the  quality  of  care  accurately. 

Long-Term  Objectives 

Profiling  potentially  can  play  an  important  role  in  quality  improvement  and  quality 
assessment.  However,  to  use  profiling  for  these  purposes,  other  requirements  must  first 
be  met.  Both  efforts  depend  on  obtaining  data  about  relevant  aspects  of  the  process  and 
outcome  of  care  for  particular  conditions.  It  is  probably  unrealistic  to  expect  that  this 
information  can  be  obtained  from  claim  forms  or  generic  encounter  forms.  More  likely, 
condition-specific  encounter  forms  will  need  to  be  developed  for  this  purpose.  The  burden 
of  completing  these  forms  will  limit  the  number  of  conditions  about  which  providers  and 
patients  can  be  profiled  at  any  given  time. 

Two  additional  conditions  must  be  met  before  profiling  can  be  used  to  make  valid 
assessments  of  the  quality  of  a  provider's  performance  (for  example,  for  purposes  of 
certification,  credentialing,  or  granting  hospital  privileges).  First,  practice  guidelines  that 
can  serve  as  accepted  standards  of  care  need  to  be  identified  or  developed.  Appropriate 
utilization  or  outcome  rates  must  then  be  ascertained  by  monitoring  use  of  these  guidelines 
in  actual  practice  (PPRC  1992). 

Data  Requirements  to  Support  Profiling 

Since  the  quality  of  profiles  largely  depends  on  the  data  that  underpin  them,  improvement 
of  available  data  sources  is  a  high-priority  task.  The  critical  steps  listed  below  focus  not 
only  on  administrative  data,  which  serve  as  the  cornerstone  of  profiling,  but  also  on 
sources  that  can  capture  targeted  information  that  administrative  data  cannot  provide. 


15 


•  Methods  need  to  be  developed  to  describe  and  evaluate  the  quality  of  data 
sources,  focusing  on  their  accuracy,  representativeness,  accessibility,  and 
relevance  to  the  objectives  of  profiling.  Some  methodologic  issues  that  should 
be  addressed  include  the  accuracy  and  completeness  of  diagnostic  information, 
the  development  of  valid  methods  to  create  denominator  populations,  and  the 
specificity  of  procedure  coding. 

•  Administrative  databases  should  be  made  more  supportive  of  profiling  by 
improving  coding,  adding  specific  elements  to  non-Medicare  claim  forms,  and 
making  it  possible  to  link  databases  across  payers.  Accurate  and  reliable 
information  about  physician  specialty,  patient  condition/diagnosis,  the 
referring/ordering  physician,  and  the  physician  who  provides  each  service 
should  be  recorded  on  all  claim  forms.3  This  will  facilitate  making 
comparability  adjustments  and  creating  denominator  populations  for  profiling. 

•  An  "all-patient"  database  —  created  by  linking  data  sources  across  payers  — 
should  be  developed  to  provide  more  representative  data  for  profiling.  This  will 
improve  statistical  power  and  allow  comparisons  of  profiles  across  payers  and 
geographic  regions.  Achieving  this  goal  will  require  a  uniform  claim  form; 
uniform  physician  and  patient  identifiers;  and  standardization  of  codes  for 
services,  physician  specialty,  and  patient  condition/diagnosis  (see  Chapter  10  in 
PPRC  1992). 

•  The  development  of  data  sources  that  can  provide  targeted  information  for 
profiling  needs  to  be  supported.  Most  important  in  this  regard  is  the 
development  of  condition-specific  encounter  forms,  which  are  also  required  to 
assess  the  effectiveness  of  practice  guidelines  (PPRC  1992). 

•  Users  of  profiles  would  benefit  from  having  information  that  would  help  them 
assess  the  strengths  and  limitations  of  profiles,  such  as  the  sources  of  data,  the 
quality  of  the  data,  the  number  of  observations,  and  any  comparability 
adjustments  that  were  made.  Guidelines  should  be  developed  identifying  the 
types  of  information  that  would  best  serve  this  purpose  and  the  ways  this 
information  should  be  expressed.  The  federal  government  could  set  an  example 
here  by  including  user  information  with  all  profiling  results  disseminated  by 
Medicare  PROs. 


3  If  the  payer  has  a  central  record  of  each  physician's  specialty,  this  data  element  can  be  ascertained  from  the 
physician  identification  number  recorded  on  the  claim  form. 


16 


Developing  an  Infrastructure  to  Support  Profiling 

Achieving  the  short-  or  long-term  goals  of  profiling  requires  an  infrastructure  capable  of 
supporting  three  complementary  activities.  First,  there  must  be  a  mechanism  for  collecting 
the  necessary  data,  developing  valid  and  relevant  profiles  from  these  data,  and 
disseminating  profiling  results  to  appropriate  parties.  Second,  some  entity  needs  to  act  as 
a  convener,  bringing  physicians  in  a  community  together  in  the  context  of  a  structured 
group  process  to  educate  them  about  profiling  results  and  to  help  them  take  effective 
action.  Finally,  since  profiling  involves  many  different  constituencies  —  practitioners,  public 
and  private  payers,  consumers,  and  researchers  -  processes  are  required  to  identify  and 
respond  to  their  diverse,  and  sometimes  conflicting,  needs  and  to  facilitate  their  successful 
interaction. 

All  these  activities  would  benefit  from  an  infrastructure  capable  of  organizing  data  and 
personnel  in  local  health  care  communities  and  providing  links  between  communities.4 
This  could  be  achieved  through  state  or  local  entities,  such  as  the  Community  Health 
Management  Information  Systems  sponsored  by  the  John  A.  Hartford  Foundation,  that 
have  some  type  of  national  coordination.  Alternatively,  the  structure  could  comprise  a 
national  entity  with  state  or  local  branches,  such  as  the  Health  Expenditure  Board/State 
Consortia/Quality  Board  infrastructure  proposed  in  recent  health  care  reform  legislation 
(S.  1227).  Any  successful  profiling  infrastructure  will  require  a  manpower  strategy  to 
ensure  appropriate  training  and  utilization  of  personnel  at  the  local  level.  In  the  coming 
year,  the  Commission  will  explore  initiatives  in  the  public  and  private  sectors  to  identify 
models  best  suited  to  serving  this  purpose.  Potential  short-  or  long-term  roles  for 
Medicare  carriers  and  PROs  in  the  profiling  infrastructure  will  be  considered. 

The  convener  function  in  profiling  requires  not  only  effective  educational  and  feedback 
techniques,  but  also  an  infrastructure  that  can  encourage  and  enable  independent 
physicians  in  a  community  to  work  together  as  a  group.  As  part  of  its  upcoming  work 
plan,  the  Commission  will  study  group  processes  that  are  now  being  used  to  support 
feedback  and  the  use  of  profiles  by  physicians.  Of  particular  interest  will  be  efforts  by 
conveners  other  than  staff-  and  group-model  HMOs,  such  as  researchers  (e.g.,  the  Maine 
Medical  Assessment  Foundation);  PROs;  independent  practice  associations;  medical 
schools;  and  teaching  hospitals.  Other  potential  applications  of  the  convener  function  will 
also  be  considered.  For  example,  by  creating  "group  practices  without  walls"  (Berenson 
1992),  the  convener  function  could  play  an  important  role  in  managing  care  as  well  as  in 
profiling. 


4  Infrastructures  capable  of  supporting  data  needs  are  discussed  in  more  detail  in  Chapter  10  of  the 
Commission's  Annual  Report  to  Congress  1992  (PPRC  1992). 


17 


Facilitating  interaction  of  the  various  parties  affected  by  profiling  will  require  an 
infrastructure  capable  of  dealing  with  the  difficult  issues  of  confidentiality  and  access  to 
profiling  information.  Some  consumer  advocates  believe  that  the  release  of  profiling 
information  can  protect  the  public  from  poor  quality  practitioners  and  hospitals  by 
providing  patients  with  information  for  choosing  providers  and,  through  the  pressure  of 
public  accountability,  by  stimulating  providers  to  improve  their  performance.  On  the  other 
hand,  sanctioning  providers  on  the  basis  of  profiling  results  may  make  them  less  likely  to 
admit  that  they  need  to  change  their  practices  or  to  use  profiling  information  to  achieve 
quality  improvement.  As  part  of  its  evaluation  of  infrastructures  to  support  profiling,  the 
Commission  will  be  particularly  interested  in  determining  the  extent  to  which  effective 
public  representation  in  the  profiling  process  -  through,  for  instance,  the  development, 
interpretation,  and  use  of  profiling  information  --  might  be  able  to  substitute  for  the 
release  of  profiling  information  in  meeting  the  public's  needs. 


REFERENCES 

Berenson,  Robert  A.,  "A  Physician's  View  of  Managed  Care,"  Health  Affairs,  10(4):  106-1 19, 
Winter  1992. 

Brand,  Donald  A.,  Lois  Quam,  and  Sheila  Leatherman,  "Data  Needs  of  Profiling  Systems," 
in  Physician  Payment  Review  Commission,  Conference  on  Profiling,  No.  92-2 
(Washington,  DC:  PPRC,  May  1992). 

Brand,  Donald  A.,  Denise  Acampora,  Louis  D.  Gottlieb,  et  al.,  "Adequacy  of  Antitetanus 
Prophylaxis  in  Six  Hospital  Emergency  Rooms,"  New  England  Journal  of  Medicine, 
309:636-640,  1983. 

Keller,  Robert  B.,  Alice  M.  Chapin,  and  David  N.  Soule,  "Informed  Inquiry  into  Practice 
Variations:  the  Maine  Medical  Assessment  Foundation,"  Quality  Assurance  in 
Health  Care,  2:69-75,  1990. 

McNeil,  Barbara  J.,  Sarah  H.  Pedersen,  and  Constantine  Gatsonis,  "Current  Issues  in 
Profiles:  Potentials  and  Limitations,"  in  Physician  Payment  Review  Commission, 
Conference  on  Profiling,  No.  92-2  (Washington,  DC:  PPRC,  May  1992). 

Physician  Payment  Review  Commission,  Annual  Report  to  Congress  1992,  Chapter  8 
(Washington,  DC:  PPRC,  1992). 

Schoenbaum,  Stephen  C.  and  Katherine  Oates  Murrey,  "Impact  of  Profiles  on  Medical 
Practice,"  in  Physician  Payment  Review  Commission,  Conference  on  Profiling,  No. 
92-2  (Washington,  DC:  PPRC,  May  1992). 


18 


19 


PAPER  NO.  2 


DATA  NEEDS  OF  PROFILING  SYSTEMS 


Authors: 

Donald  A.  Brand,  Ph.D. 
Lois  Quam,  M.A. 
Sheila  Leatherman 


Address: 

United  HealthCare  Corporation 
9900  Bren  Road  East 
Minneapolis,  MN  55440-8001 


DATA  NEEDS  OF  PROFILING  SYSTEMS 


Medical  care  in  the  United  States  seems  to  defy  management.  The  staggering  costs,  the 
complexity  of  treatments,  the  scientific  uncertainty  about  optimal  clinical  management,  and 
the  scarcity  of  information  on  cost-effectiveness  have  combined  to  produce  a  health  care 
enterprise  so  vast  and  complex  that  it  appears  unmanageable.  Yet  it  must  be  managed. 
Tools  must  be  developed  to  elucidate  the  practice  of  medicine  and  provide  the  information 
necessary  for  effective  management. 

Profiling  of  medical  practices  is  an  informational  tool  that  has  been  suggested  to  help 
achieve  these  goals  (PPRC  1991).  Profiling  may  be  defined  as  the  analysis  of  rates  of 
events  pertaining  to  the  process  or  outcome  of  medical  care  provided  by  health  care 
practitioners  to  defined  populations.  Rates  may  refer  to  dollars  spent,  number  of  services 
provided,  or  number  of  outcome  events  occurring  per  capita  in  a  given  unit  of  time. 
Health  care  practitioners  include  the  entire  spectrum  of  professionals  who  make  patient 
management  decisions,  such  as  physicians,  dentists,  physician  assistants,  and  nurse 
practitioners.  Populations  may  be  comprised  of  individuals  defined  by  their  use  of  specific 
providers  (e.g.,  physicians,  clinics,  health  plans,  hospitals),  their  eligibility  for  specific 
benefits  (e.g.,  employer  groups,  Medicare  beneficiaries),  or  their  residence  in  specific 
localities  (e.g.,  regions,  states,  or  metropolitan  areas). 

The  overall  objective  of  profiling  is  to  use  epidemiologic  methods  to  describe  medical 
practices,  monitor  health  outcomes,  and  assess  the  efficiency  and  quality  of  care.  Profiling 
can  provide  managers,  purchasers,  consumers,  regulators,  and  policy  makers  with 
information  to  compare  providers  on  dimensions  of  cost,  utilization,  quality,  and  access. 
Profiling  can  also  be  used  to  assess  physician  performance  for  administrative  purposes  such 
as  selection  for  a  network,  credentialing,  or  payment.  If  properly  carried  out,  profiling  has 
the  potential  to  identify  deficiencies  or  excesses  in  care.  The  resultant  knowledge  can  be 
used  to  support  efforts  to  improve  the  quality  of  care  and  the  efficiency  of  the  health  care 
system. 

Profiling  differs  from  medical  technology  assessment.  The  objective  of  technology 
assessment  is  to  evaluate  the  effectiveness  of  a  given  treatment  or  diagnostic  test  or  to 
determine  the  specific  categories  of  patients  most  likely  to  receive  benefit.  The  objective 
of  profiling  is  to  observe  the  actual  use  of  medical  technology  in  practice  and  measure  the 
extent  to  which  it  is  being  used  appropriately.  In  profiling,  the  independent  variable  is  the 
provider  or  context  of  care,  while  in  technology  assessment  the  independent  variable  is  the 
diagnostic  or  therapeutic  maneuver.  A  profiling  analysis  might  investigate  variations  in 
outcomes  for  a  given  procedure  at  different  hospitals  (Williams  et  al.  1991).  A  technology 
assessment,  in  contrast,  might  compare  outcomes  among  patients  undergoing  different 


21 


procedures  for  a  given  clinical  condition  (Roper  et  al.  1988;  Javitt  et  al.  1991). 
Distinguishing  between  these  two  types  of  investigation  is  important  in  order  to  address 
the  data  issues  pertinent  to  profiling  systems.1 

Profiling  is  a  management  tool.  Like  other  management  tools,  profiling  has  advantages 
and  limitations.  Profiling  can  describe  overall  patterns  of  resource  use  and  suggest  areas 
for  improvement  in  efficiency  or  quality  of  care.  Profiling  cannot,  on  the  other  hand, 
elucidate  the  finer  details  of  care  that  may  affect  system  performance.  A  profile  can 
'report  the  rate  of  breast  biopsies  among  women  covered  by  a  particular  insurance  program 
or  treated  by  a  given  physician,  for  example,  but  a  profile  cannot  assess  a  physician's  skill 
in  performing  the  physical  examination  that  prompted  the  biopsy,  the  reasoning  behind  the 
decision  to  order  the  biopsy,  or  the  quality  of  the  physician's  communication  with  the 
patient. 

Valid  profiling  analysis  depends  on  the  availability  of  pertinent  data  and  an  understanding 
of  the  strengths  and  limitations  of  a  given  data  source.  This  paper  will  discuss  data  issues 
relevant  to  profiling,  beginning  with  a  description  and  critique  of  various  measures  of 
medical  practice.  Alternative  data  sources  and  a  framework  for  evaluating  data  quality  will 
be  discussed  next.  Finally,  key  analytical  issues  concerning  the  proper  use  of  data  for 
profiling  will  be  addressed. 

MEASURES  OF  MEDICAL  PRACTICE 

In  profiling,  medical  care  is  represented  as  a  succession  of  discrete  services  delivered  to 
individuals  in  a  specified  population.  This  model  of  medical  care  is  useful  for  management 
functions  such  as  cost  accounting  and  quality  surveillance,  but  the  model  does  not  capture 
the  essence  of  medical  care  as  most  people  experience  it.  The  occurrence  of  an 
intermediate  office  visit  (CPT  code  90060)  (AMA  1991),  followed  by  a  chest  X-ray 
(71030),  an  upper  gastrointestinal  endoscopy  (43234),  then  another  intermediate  office  visit 
(90060),  indicates  a  patient's  interaction  with  the  health  care  system,  and  this  list  of 
services  tells  a  fair  amount  about  the  nature  of  the  interaction.  But  medical  care  is  not 
simply  a  succession  of  events.  It  is  a  continuous  process  that  can  be  profoundly  affected 
by  variables  not  captured  in  units  of  service,  such  as  a  physician's  skill  in  taking  a  history 
or  performing  a  physical  examination,  sensitivity  in  assessing  needs  that  may  not  always 
be  well  articulated  by  the  patient,  ability  and  willingness  to  include  a  patient  in  decision 
making,  and  competence  to  reason  logically  from  evidence.  The  experience  is  also  affected 
by  factors  not  directly  related  to  practice  styles  and  skills  of  individual  clinicians,  such  as 
the  structure  of  benefit  packages,  the  amount  of  paperwork  required  to  obtain  insurance 


The  same  data  sources  may  be  used  for  both  profiling  and  technology  assessment,  but  the  data  issues  are 
not  identical. 


22 


coverage,  and  the  patient's  own  decisions  and  actions.  This  broader  view  of  medical  care 
is  largely  lost  in  the  discrete  event  model  of  profiling. 

As  long  as  the  limitations  of  the  profiling  model  are  recognized,  profiling  can  offer  a  useful 
approach  to  the  measurement  of  medical  practice.  Two  types  of  measures  may  be  applied: 
measures  of  service  and  measures  of  outcome. 

Measures  of  Service 

Profiling  has  traditionally  focused  on  the  quantity  of  services  delivered,  expressed  either 
in  dollars  or  in  units  of  service.  Dollars  are  typically  used  when  the  objective  of  profiling 
is  resource  management.  Dollars  offer  the  advantage  of  a  common  unit  of  measure,  but 
they  sacrifice  descriptiveness.  Also,  dollars  may  not  necessarily  reflect  the  true  "value"  of 
the  services  represented  due  to  distortions  in  rates  of  payments  for  different  types  of 
services.  For  example,  the  magnitude  of  the  discrepancy  in  payments  for  major  surgical 
procedures  versus  primary  care  services  may  not  be  justified  based  on  their  relative  values 
(PPRC  1991). 

Reporting  units  of  service,  rather  than  dollars,  provides  greater  insight  into  the  medical 
care  process.  Number  of  office  visits,  number  of  laboratory  procedures,  number  of 
pediatric  vaccinations,  or  number  of  balloon  angioplasties  offer  more  clinically  meaningful 
information  than  number  of  dollars  spent.  Units  of  service  have  been  reported  in  several 
well-known  studies  of  geographic  variations  in  medical  practice.  These  studies  have 
analyzed  variations  in  hospitalization  rates  (Wennberg  et  al.  1989)  and  in  rates  of  specific 
procedures  such  as  tonsillectomy,  coronary  angiography,  carotid  endarterectomy,  and  upper 
gastrointestinal  tract  endoscopy  (Roos  et  al.  1977;  Chassin  et  al.  1987). 

Services  may  be  aggregated  in  various  ways  depending  on  the  purpose  of  the  analysis. 
Total  dollars  per  Medicare  beneficiary  per  year  would  be  a  measure  of  service  utilization 
at  a  global  level.  More  refined  analyses  would  report  dollars  (or  units  of  service)  by 
category  (e.g.,  physician,  laboratory,  radiology,  pharmacy,  emergency  department,  inpatient 
hospital).  Instead  of  dollars  per  year,  other  analyses  might  report  dollars  per  episode  of 
illness  (Salkever  et  al.  1976),  dollars  per  episode  of  care  (Hornbrook  et  al.  1985),  or  dollars 
per  office  visit. 

Service  utilization  may  be  reported  for  specific  subsets  of  patients,  rather  than  for  all 
patients  under  a  provider's  care.  For  example,  a  recent  study  used  profiling  to  investigate 
variations  in  time  spent  and  procedures  performed  during  individual  patient  encounters 
among  rheumatologists  treating  patients  with  rheumatoid  arthritis  (Henke  and  Epstein 
1991).  Numerous  studies  have  used  profiling  to  analyze  rates  of  cesarean  section  in 
pregnant  populations  (Goyert  et  al.  1989;  Notzon  et  al.  1987;  Taffel  et  al.  1987}. 


23 


Neither  dollars  nor  units  of  service  measure  many  important  facets  of  a  patient's 
experience  of  the  care  process.  Patient  satisfaction,  indicators  of  access  such  as 
appointment  waiting  times,  and  patients'  understanding  of  care  processes  are  examples  of 
important  measures  that  have  not  historically  been  included  in  profiling  analyses.  These 
measures  do  not  fit  the  traditional  profiling  model  because  the  pertinent  data  are  primarily 
qualitative  and  do  not  translate  readily  into  rates.  Perhaps  the  traditional  concept  of 
profiling  should  be  expanded  to  accommodate  these  issues.  To  do  so  will  require  use  of 
appropriate  measurement  scales  and  instruments,  development  of  ongoing  data  collection 
systems,  and  a  consensus,  heretofore  lacking,  that  this  information  is  worth  paying  for. 
Methodologic  work  has  already  been  done  to  begin  laying  the  necessary  groundwork  (Ware 
and  Hays  1989;  Ware  and  Davies  1988;  Darby  1991),  and  a  few  organizations  have  begun 
to  monitor  patients'  perceptions  of  care  on  an  ongoing  basis  (Kerr  1989;  Davies  and  Ware 
1988;  Schlackman  1989). 

Measures  of  Outcome 

Outcome  measures  consider  the  ultimate  purpose  of  medical  care,  which  is  the 
maintenance  or  attainment  of  patients'  health.  Knowledge  of  end  results  does  not, 
however,  indicate  what  processes  should  be  altered  or  adopted  to  improve  health. 
Outcome  measurements  that  identify  quality  problems  should  be  accompanied  by 
investigations  of  the  care  process  to  determine  causes  and  suggest  changes  (Berwick  1991, 
1989). 

The  use  of  an  outcome  measure  depends  on  the  subject  and  purpose  of  the  analysis. 
Death  is  a  legitimate  outcome  measure  of  quality  when  variations  in  the  process  of  care 
can  reasonably  be  expected  to  influence  the  death  rate  in  the  study  population.  This  would 
probably  be  the  case  when  analyzing  major  surgical  procedures  or  hospitalizations  for 
potentially  life  threatening  conditions.  When  reporting  death  rates,  it  is  important  to 
select  the  appropriate  time  horizon  for  a  given  analysis  -  for  example,  in-hospital  mortality 
vs.  mortality  within  30  days  of  admission  vs.  mortality  within  30  days  of  discharge.  An 
incorrect  time  horizon  will  produce  biased  mortality  results. 

Case  Study:  Effect  of  Time  Horizon  on  Analysis  of  Death  Rates.  An  analysis  of 
death  rates  in  selected  hospitalized  Medicare  patients  revealed  a  25  percent 
difference  between  New  York  and  California  in  in-hospital  mortality  (Jencks  et  al. 
1988).  When  death  within  30  days  of  admission  was  considered,  however,  mortality 
was  virtually  identical  in  the  two  states.  Because  the  average  length  of  stay  in 
California  was  only  half  that  in  New  York,  more  patients  in  California  died  after 
discharge  from  the  hospital  (but  within  30  days  of  hospitalization),  and  these 
patients  were  not  counted  as  in-hospital  deaths.  Using  in-hospital  mortality 
produced  a  biased  comparison,  making  New  York  appear  to  have  a  higher  death 
rate  simply  because  the  period  of  data  collection  was  longer  in  New  York. 


24 


Post-surgical  complications  and  hospital  readmissions  are  other  possible  measures  of 
outcome.  Although  readmissions  actually  signal  utilization  of  services,  this  measure  is 
often  used  as  a  surrogate  outcome  measure  on  the  assumption  that  adverse  outcome 
events  prompted  the  patient's  return  to  the  hospital  (Thomas  and  Holloway  1991). 

Because  data  on  deaths,  complications,  and  hospital  readmissions  are  relatively  easy  to 
obtain,  these  outcomes  have  been  frequently  reported  in  studies  that  use  profiling.  For 
the  same  reason,  clinical  events  with  established  diagnostic  labels  (e.g.,  late  stage  breast 
cancer)  (Institute  of  Medicine  1990)  have  also  served  as  measures  of  outcome. 
Unfortunately,  these  classes  of  outcomes  are  not  relevant  to  most  clinical  problems,  and 
the  outcomes  that  are  relevant  are  much  more  difficult  to  monitor.  For  acute  illnesses  and 
injuries,  the  length  of  time  to  resolution  or  healing  is  a  more  appropriate  measure.  For 
chronic  conditions,  patient  reported  functional  status  or  quality  of  life  assessments  may 
offer  the  most  insight.  This  extremely  valuable  information  is  noticeably  lacking  from  most 
current  data  systems,  and  capturing  it  will  require  major  new  commitment  and  investment. 

DATA  SOURCES 

Framework  for  Evaluating  a  Data  Source 

Obtaining  data  that  are  reasonably  accurate,  representative,  accessible,  and  relevant  to  the 
objectives  of  profiling  is  a  formidable  challenge. 

Accuracy.  Data  accuracy  implies  several  things:  (1)  an  acceptably  low  rate  of  random 
errors  such  as  the  reversing  of  digits,  (2)  an  absence  of  bias  in  the  acquisition  or  recording 
of  data,  and  (3)  a  reasonably  low  rate  of  missing  values. 

The  need  to  minimize  random  errors  that  occur  in  the  data  collection  process  is  self- 
evident.  Provided  the  error  rate  is  not  excessive,  however,  this  type  of  error  does  not  pose 
a  major  threat,  because  these  errors  tend  to  be  equally  distributed  throughout  the  data. 

Avoiding  bias  in  data  acquisition  is  a  more  critical  requirement.  If,  for  example, 
practitioners  systematically  adjust  their  terminology  when  submitting  insurance  claims  to 
circumvent  benefit  restrictions,  profiling  based  on  claims  data  will  be  inaccurate  and 
biased.  Similarly,  failure  to  submit  insurance  claims  for  certain  services  because  the 
services  are  not  covered  benefits  would  produce  non-random  gaps  in  a  claims  database. 
In  contrast,  five  percent  random  coding  errors  would  produce  slightly  inaccurate  results, 
but  they  would  not  be  biased. 

Finally,  missing  data  can  threaten  the  validity  of  an  analysis.  Frequently  skipped  questions 
on  a  survey,  for  example,  may  have  to  be  excluded  from  analysis  in  order  to  avoid  the 
possibility  of  misleading  results  based  on  skewed  samples. 


25 


Representativeness.  Data  used  to  compute  rates  in  a  profiling  analysis  must  faithfully 
represent  the  patient  and  provider  populations  to  which  the  analysis  refers  and  the  services 
or  relevant  outcomes  pertaining  to  those  individuals.  A  data  source  can  be  considered 
representative  if  it  describes  all  services  or  outcomes  for  all  individuals  in  the  given 
population  or  if  it  contains  an  unbiased  sample  of  these  data.  The  five  percent  research 
sample  of  Medicare  claims,  for  example,  is  a  random  and  therefore  unbiased  sample  of 
all  Medicare  claims.  In  contrast,  data  from  a  single  insurer  are  unlikely  to  be 
representative  of  a  physician's  entire  practice.  A  data  source  need  not  be  complete,  that 
*  is  be  a  100  percent  sample  of  observations,  to  produce  valid  results.  It  does,  however, 
need  to  contain  a  fair  representation  of  the  population  to  which  it  refers. 

Accessibility.  Data  may  be  inaccessible  because  they  are  confidential  or  proprietary, 
because  they  are  highly  decentralized,  because  they  are  prone  to  misplacement  (a  problem 
with  conventional  paper  records),  or  because  their  source  may  be  elusive  (a  problem  with 
the  collection  of  survey  data).  Accessibility  problems  vary  markedly  from  one  data  source 
to  another.  Since  the  cost  of  acquiring  data  is,  in  general,  inversely  related  to  its 
accessibility,  cost  considerations  will  almost  certainly  dictate  use  of  the  more  accessible 
data  sources  for  large-scale  profiling  systems. 

Relevance.  To  be  useful  for  profiling,  a  data  source  must  contain  the  variables  relevant 
to  the  goals  of  a  given  analysis,  and  data  must  be  recorded  at  an  adequate  level  of 
precision.  An  investigation  of  the  appropriateness  of  upper  gastrointestinal  endoscopy 
based  on  the  presence  of  specific  clinical  and  historical  indicators  requires  a  data  source 
that  includes  those  indicators.  A  study  of  extreme  immaturity  (birthweight  less  than 
lOOOg)  using  ICD-9-CM  codes  (Commission  on  Professional  and  Hospital  Activities  1980) 
requires  the  recording  of  at  least  four  digits  of  the  code  to  identify  this  group  of  newborns. 

Because  no  single  existing  data  source  measures  up  well  to  all  of  these  criteria  (Siu  et  al. 
1991),  an  approach  that  combines  several  sources  may  be  advisable.  In  addition,  the 
quality  of  each  individual  data  source  will  have  to  be  improved  for  profiling  to  realize  its 
full  potential.  It  is  also  worth  considering  the  development  of  new  data  sources  for 
profiling,  although  the  feasibility  of  introducing  new  data  acquisition  processes  is  uncertain 
and  the  cost  could  be  substantial. 

The  appropriate  standard  of  data  quality  depends  upon  the  purpose  for  which  data  will  be 
used.  If  results  are  to  be  considered  as  a  sole  basis  for  judgments  about  quality  of  care, 
physician  payment,  or  physician  credentialing,  then  a  very  high  standard  of  data  quality 
must  be  met.  If  results  are  to  be  used  in  conjunction  with  other  data  or  for  strictly 
descriptive  purposes,  then  a  less  stringent  standard  of  quality  would  be  acceptable. 


26 


Critique  of  Individual  Data  Sources 

The  advantages  and  limitations  of  the  data  sources  most  relevant  to  profiling,  examples 
of  studies  that  have  used  these  data  sources,  and  prospects  for  innovation  aimed  at 
improving  the  data  sources  are  discussed  below. 

Administrative  Databases.  Payment  for  services  covered  by  Medicare,  indemnity 
insurance,  and  many  prepaid  health  plans  depends  on  claims  submitted  by  patients  or 
providers.  The  administrative  data  systems  which  process  and  house  these  claims  produce 
an  obvious  source  of  data  for  profiling.  Although  the  structure  and  content  of  these 
databases  vary  widely  among  payers,  they  share  certain  characteristics  and  have  many 
common  strengths  and  weaknesses. 

Government  administrative  databases  have  been  widely  used  for  research  purposes. 
Research  on  medical  practice  has  been  conducted  using  Medicare  claims  databases  (Jencks 
et  al.  1988),  Medicaid  databases  in  several  states  (Ray  and  Griffin  1989),  and  Canadian 
provincial  databases  (Roos  et  al.  1985).  Several  indemnity  insurance  carriers  and  large 
health  plans  which  contract  with  physicians  use  their  claims  data  to  analyze  physician 
practices.2  Large  staff  model  health  maintenance  organizations  may  also  collect  data  that 
can  be  used  for  profiling.3  Their  salaried  physicians  submit  encounter  records  describing 
services  provided  rather  than  claims  for  payment. 

Claims  data  have  many  attractive  features.  The  routine  collection  and  electronic  storage 
of  claims  data  for  payment  purposes  eliminate  the  usual  research  costs  for  data  acquisition, 
making  this  a  highly  accessible  data  source.  Claims  databases  contain  information  on 
thousands,  in  some  cases  millions,  of  individuals.  Such  large  populations  allow  a  high  level 
of  confidence  in  the  reporting  of  statistical  results  and  make  it  possible  to  investigate 
relatively  rare  events  and  low  prevalence  conditions.  Claims  provide  a  more  complete,  less 
skewed  description  of  an  individual's  care  than  an  account  of  services  delivered  at  a  single 
location,  such  as  a  particular  hospital  or  physician's  office.  Claim  files  contain  information 
on  individuals  over  extended  time  periods,  allowing  for  longitudinal  studies  and  long-term 
follow-up.  Finally,  the  broad  spectrum  of  patients  represented  in  claim  databases  makes 
it  possible  to  study  certain  segments  of  the  population  that  are  typically  excluded  from 
clinical  research  ~  the  elderly,  for  example  (Steinberg  et  al.  1990). 

Claims  data  also  have  drawbacks.  Because  people  switch  forms  of  insurance  coverage, 
claims  databases  describe  unstable  populations.  This  instability  complicates  profiling 
analyses  based  on  claims  data. 


2  The  United  HealthCare  Corporation,  U.S.  HealthCare,  Aetna,  and  Prudential  are  examples  of  companies 
that  have  used  their  databases  for  practice  profiling. 

3  Kaiser  Permanente  and  Group  Health  Cooperative  of  Puget  Sound  are  examples. 


27 


Claim  files  are  incomplete  because  certain  services  do  not  generate  claim  records  for  a 
variety  of  reasons.  First,  certain  categories  of  service  may  be  excluded  from  coverage.  For 
example,  indemnity  insurance  programs  typically  exclude  preventive  services,  and  many 
policies,  including  those  issued  by  Medicare,  do  not  cover  prescription  drugs.  Absence  of 
drug  data  seriously  compromises  the  ability  to  carry  out  certain  types  of  profiling,  given 
the  dominant  role  of  drugs  in  the  practice  of  medicine.  Second,  health  plans  with  defined 
provider  networks  often  do  not  pay  for  services  delivered  by  physicians  outside  the 
network  and  therefore  do  not  receive  claims  for  those  services.  Third,  copayment  and 
deductible  provisions  sometimes  create  situations  that  discourage  claim  submission. 
Certain  drug  benefit  packages,  for  example,  require  a  fixed  copayment  of  several  dollars 
for  each  prescription.  If  the  price  of  a  particular  prescription  falls  below  the  copayment 
amount,  the  patient  pays  the  full  price  out  of  pocket  and  the  pharmacy  has  no  reason  to 
submit  a  claim.  Similarly,  if  a  patient's  annual  covered  medical  expenses  fall  below  a 
policy's  deductible,  the  patient  may  not  submit  those  expenses,  since  there  is  no 
reimbursement  incentive.  Fourth,  because  certain  services  are  billed  as  packages,  it  is  not 
always  possible  to  determine  details  of  the  care  that  was  actually  delivered.  For  example, 
if  an  obstetrician  submits  a  single  claim  covering  prenatal  care  and  delivery,  a  claims 
database  will  not  contain  a  record  of  individual  prenatal  office  visits.  Similarly,  surgeons' 
common  practice  of  including  follow-up  care  in  their  fee  for  an  operation  may  obscure  the 
details  of  follow-up,  including  the  occurrence  of  post-operative  complications. 

These  gaps  in  administrative  data  pose  problems  for  profiling  not  only  because  they  result 
in  an  incomplete  record  of  services,  but  also  because  the  gaps  vary  from  policy  to  policy, 
making  it  difficult  to  understand  the  nature  and  extent  of  the  missing  information. 

Case  Study:  Effect  of  Incomplete  Pharmacy  Data  on  the  Profiling  of  Hypertension 
Care.  A  claims  analysis  was  conducted  to  study  the  medical  care  delivered  to 
patients  with  hypertension  in  a  health  plan.  The  analysis  revealed  that  a  smaller- 
than-expected  proportion  of  pharmacologically  treated  patients  were  taking 
diuretics.  Investigation  of  the  prices  of  drugs  used  to  treat  hypertension  showed 
that  the  price  of  a  prescription  for  many  diuretics  fell  below  $10.  Since  the  health 
plan's  pharmacy  benefit  required  a  $10  copayment  per  prescription,  patients  would 
have  paid  the  full  price  of  these  inexpensive  diuretics  out  of  pocket  and  the 
pharmacy  may  not  have  submitted  claims  for  them.  It  had  to  be  assumed, 
therefore,  that  this  administrative  database  underrepresented  use  of  diuretics  in 
this  population,  but  the  actual  number  of  patients  taking  these  drugs  remained 
unknown. 

Case  Study:  Effect  of  Incomplete  Office  Visit  Data  on  the  Profiling  of  Ambulatory 
Care.  An  insurer's  analysis  of  physician  office  visits  in  a  metropolitan  area 
revealed  that  one  physician  group  had  a  significantly  lower  than  average  annual 
rate  of  office  visits  per  patient.  The  insurer  wanted  to  investigate  whether  the 
physicians  in  question  were  scheduling  too  few  visits  or  whether  the  remaining 


28 


physicians  may  have  been  scheduling  more  visits  than  necessary.  Examination  of 
patients'  insurance  policies  revealed  that  the  difference  in  visit  rates  may  have  been 
an  artifact  of  different  benefit  packages.  The  physicians  in  question  treated  a  large 
number  of  patients  whose  policies  had  a  deductible.  Since  these  patients  failed  to 
submit  claims  for  medical  services  when  their  annual  medical  costs  fell  below  the 
deductible,  many  office  visits  did  not  appear  in  the  insurer's  claims  database.  As 
a  result,  the  insurer  could  not  determine  from  its  claims  whether  any  real 
differences  in  office  visit  rates  existed  between  the  physician  groups. 

Claim  records  may  be  inaccurate.  Besides  the  unavoidable  problem  of  clerical  errors, 
there  are  three  primary  areas  of  concern.  First,  the  financial  incentive  that  virtually 
guarantees  the  submission  of  claims  for  covered  services  also  compromises  the  objectivity 
of  diagnosis  and  procedure  coding.  The  provider  may  try  to  maximize  payment  (this 
phenomenon  has  been  referred  to  as  upcoding  (Berenson  and  Holahan  1990,  1992)  or 
DRG  creep  (Simborg  1981),  depending  on  the  context),  or  he  may  try  to  help  the  patient 
obtain  reimbursement  by  avoiding  the  use  of  certain  diagnosis  or  procedure  labels  (e.g., 
treatment  for  obesity  and  cosmetic  surgery  are  commonly  excluded  services).  Second,  the 
provider  has  little  incentive  to  be  diligent  about  the  coding  of  diagnoses  in  cases  where  the 
diagnosis  does  not  affect  payment.  The  accuracy  of  diagnoses  recorded  on  insurance 
claims  has  been  studied  (Studney  and  Hakstian  1981),  but  additional  research  is  needed 
to  clarify  this  important  issue.  Third,  the  ICD-9-CM  coding  system  (Commission  on 
Professional  and  Hospital  Activities  1980),  itself,  has  limited  clinical  specificity  (Feinstein 
1988).  These  limitations  must  be  recognized  when  interpreting  profiling  results  that 
classify  patients  by  diagnosis. 

While  insurance  claims  document  what  services  were  delivered,  they  lack  the  information 
about  a  patient's  history,  physical  findings,  and  test  results  that  could  explain  why  those 
services  were  delivered.  This  sparseness  limits  the  ability  of  claims  data  to  elucidate  the 
logic  of  the  care  process. 

Claims  also  provide  only  limited  outcome  data.  They  include  indicators  for  major  outcome 
events  such  as  deaths,  in-hospital  complications,  diagnoses  that  might  be  used  as  outcome 
measures  (e.g.,  late  stage  breast  cancer),  and  services  used  as  surrogate  outcome  measures 
(e.g.,  hospital  readmissions  or  use  of  emergency  services),  but  they  do  not  record  outcomes 
pertinent  to  most  illnesses  and  injuries,  such  as  time  to  recovery  or  healing  or 
improvement  in  functional  status. 

Finally,  the  accessibility  of  claims  data  is  limited  by  the  proprietary  nature  of  private  U.S. 
insurance  databases.  Unless  insurers  are  willing  to  share  this  data  and  link  it  to  other 
databases,  the  data  may  not  be  representative  of  physicians'  practices.4  Without  such 


4  Except  for  physicians  who  practice  in  closed-panel  health  maintenance  organizations. 


29 


linkages,  the  ability  to  carry  out  longitudinal  studies  of  medical  practices  and  health 
outcomes  will  also  be  limited.  Even  if  private  insurers  make  their  data  available,  different 
procedures  for  data  coding  and  storage  may  severely  hamper  the  successful  pooling  of 
data.  Also,  creating  a  global  description  of  a  patient's  care  when  services  are  paid  for  by 
several  different  insurers  requires  a  system  for  uniquely  identifying  individuals  to  permit 
the  necessary  record  linkages.  At  present,  each  insurer  uses  a  different  system  for  patient 
identification. 

"Numerous  profiling  studies  based  on  administrative  data  have  been  reported  over  the  past 
two  decades.  Examples  of  well-respected,  highly  provocative  investigations  include  an 
analysis  of  the  relationship  between  the  volume  of  surgical  procedures  and  postoperative 
mortality  (Luft  et  al.  1979),  reports  of  geographic  variations  in  rates  of  hospitalization  and 
operation  (Wennberg  et  al.  1989;  Chassin  et  al.  1987),  and  accounts  of  differences  in  death 
rates  following  surgery  among  major  teaching  hospitals  in  a  given  city  (Williams  et  al. 
1991).  One  recent  study  compared  rates  of  diagnostic  imaging  procedures  according  to 
whether  primary  physicians  had  imaging  equipment  in  their  offices  or  referred  patients  to 
radiologists  (Hillman  et  al.  1990).  Other  studies  have  analyzed  changes  in  practice  patterns 
over  time  (Taffel  et  al.  1987). 

In  spite  of  their  limitations,  claims  data  systems  provide  a  good  match  to  the  needs  of  the 
profiling  model.  This  match  should  not  be  surprising,  since  the  profiling  method  took  hold 
because  such  data  systems  became  available. 

Medical  Records.  Patient  medical  records  offer  the  most  detailed  and  complete  medical 
care  data  that  are  collected  on  a  routine  basis.  Because  medical  records  document  a 
patient's  history,  physical  findings,  and  test  results,  they  provide  the  relevant  detailed 
information  needed  for  in-depth  analysis  of  the  care  process  and  of  certain  clinical 
outcomes. 

Although  the  medical  record  is  generally  regarded  as  the  best  source  of  clinical  data,  the 
medical  record  has  its  own  deficiencies  (Romm  and  Putnam  1981;  Hendrickson  and  Myers 
1973).  It  may  be  inaccurate,  for  example,  because  physicians  commonly  fail  to  document 
normal  findings  and  certain  services  such  as  patient  education.  Ambulatory  care  medical 
records  tend  to  be  even  less  complete,  and  therefore  less  satisfactory,  for  profiling  than 
inpatient  records. 

Case  Study:  Missing  Data  as  Obstacle  to  Evaluation  Using  the  Medical  Record 
Medical  records  of  703  emergency  room  patients  treated  for  open  soft  tissue 
injuries  were  abstracted,  and  the  data  were  used  to  evaluate  quality  of  care  using 
a  clinical  practice  guideline  (Frazier  and  Brand  1979).  The  guideline,  in  the  form 
of  a  clinical  algorithm,  covered  tetanus  immunization,  criteria  for  obtaining  a 
consult,  wound  preparation,  wound  closure,  immobilization,  antibiotics,  and  follow- 
up  plan.  Determining  the  appropriate  treatment  for  a  given  patient  required  data 


30 


on  time  since  injury,  tetanus  immunization  history,  location  and  history  of  the 
injury,  type,  depth,  and  dimensions  of  the  injury,  and  evidence  of  deep  structure 
involvement.  Although  it  is  reasonable  to  expect  this  information  to  be  routinely 
recorded  for  these  injuries,  the  abstracted  medical  record  data  proved  sufficiently 
complete  for  evaluation  in  only  27  of  the  703  cases  (3.8  percent).  Substituting  a 
structured  problem-specific  form  for  the  usual  free-text  medical  record 
subsequently  increased  the  rate  of  auditable  records  to  86  percent. 

Medicolegal,  ethical,  and  practical  considerations  may  affect  the  accuracy  of  what 
physicians  include  in  the  medical  record.  They  may  be  reluctant,  for  example,  to  record 
diagnoses  with  a  social  stigma,  such  as  asthma,  alcoholism  or  a  mental  illness.  Regulators 
and  payers  increasingly  review  medical  records  for  cost  containment  and  quality  assurance 
purposes.  This  review  may  affect  the  nature  and  completeness  of  recorded  data.  For 
instance,  physicians  may  be  inclined  to  tune  the  way  they  record  information  in  order  to 
help  their  patients  obtain  maximum  insurance  coverage. 

The  accessibility  of  medical  record  data  is  limited  by  the  lack  of  a  consistent  format  for 
the  recording  of  information.  What  data  is  recorded,  where  it  is  placed,  and  how  terms 
are  defined  all  vary  from  one  physician  to  the  next.  Also,  information  is  too  often  illegible. 

Although  the  open-ended  format  of  the  medical  record  hampers  data  extraction,  it  does 
provide  great  flexibility.  The  type  of  data  (numerical,  narrative,  or  pictorial)  and  level  of 
detail  can  be  tailored  to  the  requirements  of  each  particular  case.  The  flexibility  of  the 
medical  record  is  a  strength  that  clinicians  take  for  granted,  and  its  importance  should  not 
be  underestimated  when  proposing  modifications  to  this  data  source. 

Since  medical  records  are  rarely  stored  electronically,  representative  data  have  to  be 
abstracted,  coded,  and  entered  into  a  computer  before  it  can  be  used  for  profiling.  This 
process  is  error  prone  (Demlo  et  al.  1989),  time  consuming,  quite  intrusive,  and  extremely 
expensive.  The  confidentiality  of  medical  record  data  may  also  restrict  its  accessibility  for 
profiling. 

The  Institute  of  Medicine  is  currently  directing  a  major  study  of  the  challenges  and 
opportunities  related  to  computer-based  medical  records  (Dick  and  Steen  1991).  If  this 
and  related  work  leads  to  improvements  in  medical  record  systems,  the  implications  for 
profiling  could  be  significant. 

Surveys  of  Patients  and  Physicians.  Surveys  can  augment  other  sources  of  data  for 
profiling.  Patient  surveys,  for  example,  can  illuminate  relevant  dimensions  of  care  that 
cannot  be  explored  using  routinely  collected  data.  These  dimensions  include  patients' 
perceptions  of  care,  functional  status,  quality  of  life,  and  access  to  care.  Patient  surveys 
offer  the  advantage  of  enabling  managers  to  collect  information  directly  from  the  health 
care  consumer  and  patient.  There  is  no  other  way  to  obtain  this  important  information. 


31 


Validated,  standardized  instruments  have  become  increasingly  available,  allowing 
comparisons  between  sites  and  over  time. 

Patient  satisfaction  instruments  (Davies  and  Ware  1991;  Johnson  and  Mosser  1988)  and 
generic  functional  status  and  quality  of  life  instruments  (Ware  1991)  have  been  developed 
and  are  beginning  to  be  applied  in  diverse  settings  including  health  plans  and  medical 
clinics.  Condition-specific  functional  status  instruments,  such  as  the  Diabetes  Quality  of 
Life  measure  (DCCT  Research  Group  1988)  and  the  Arthritis  Impact  Measurement  Scales 
*  (Wolfe  et  al.  1988),  are  also  available.  However,  there  is  an  absence  of  research 
demonstrating  applications  of  these  functional  status  instruments  to  physician  profiling. 
The  selection  of  appropriate  functional  status  tools  is  not  straightforward.  Generic 
functional  status  scales  may  not  be  sufficiently  sensitive  to  measure  changes  in  patient  well- 
being  (Bindman  et  al.  1990).  Condition-specific  tools,  in  contrast,  may  have  too  narrow 
a  focus  for  profiling  purposes. 

The  National  Center  for  Health  Statistics  has  sponsored  a  survey  of  physicians  known  as 
the  National  Ambulatory  Medical  Care  Survey  (NCHS  1974).  Data  from  this  survey  have 
been  used  in  various  profiling  studies.  For  instance,  one  study  analyzed  differences 
between  internists  and  family/general  practitioners  in  the  time  spent  with  patients  during 
office  visits  and  in  the  use  of  laboratory  and  X-ray  tests  (Noren  et  al.  1980). 

Surveys  have  several  limitations.  They  may  be  inaccurate  because  of  faulty  recall  (Harlow 
and  Linet  1989;  Paganini-Hill  and  Ross  1982)  and  the  Hawthorne  effect  (Simon  1978),  a 
potential  bias  caused  by  the  extra  attention  paid  to  study  subjects  in  the  conduct  of  a 
survey.  Diligent  survey  data  collection  must  be  undertaken  to  obtain  an  unbiased  sample 
and  reasonable  response  rate.  Careful  construction  of  valid  questions  is  essential  to  ensure 
that  survey  responses  are  intelligible  and  amenable  to  analysis.  These  efforts  can  be  both 
time-consuming  and  costly. 

Special-Purpose  Instruments.  Instruments  developed  for  specialized  purposes  can  provide 
data  for  profiling  beyond  that  routinely  entered  into  claims  files  or  medical  records  or 
gathered  via  surveys.  Data  collection  instruments  are  frequently  developed  for  research 
or  quality  assessment  purposes.  The  resultant  data  can  sometimes  be  used  for  profiling. 
A  multicenter  study  that  examined  quality  of  emergency  department  care,  for  example, 
compared  physicians'  use  of  antitetanus  prophylaxis  in  patients  treated  for  open  soft  tissue 
injuries  at  the  six  participating  hospitals  based  on  data  recorded  by  physicians  on  a 
standardized  instrument  (Brand  et  al.  1983). 

The  main  advantage  of  special  purpose  instruments  is  their  ability  to  capture  relevant 
information  with  great  accuracy.  The  main  drawback  is  that  they  generally  do  not  lend 
themselves  to  implementation  on  a  large  scale  due  to  high  cost  of  development  and 
implementation,  as  well  as  problems  in  integrating  them  with  existing  processes.  Special 
purpose  instruments  are  therefore  relatively  inaccessible  sources  of  data. 


32 


Prospects  for  Innovation 

The  problems  that  limit  the  usefulness  of  existing  data  sources  for  profiling  are  extremely 
varied  and  complex,  mirroring  the  health  care  system  that  they  describe.  Solutions  are  not 
readily  at  hand. 

The  representativeness  and  accessibility  of  administrative  data  would  be  significantly 
improved  if  the  various  separate  private  insurance  databases  and  the  Medicare  database 
could  be  combined.  Development  of  a  shared  administrative  database  would  require 
overcoming  the  resistance  of  private  payers  to  pooling  their  data.  It  would  also  require 
establishing  a  standard  set  of  variables  and  coding  conventions.  Challenging  issues  of 
patient  and  provider  confidentiality  and  privacy  must  also  be  confronted.  Although  it 
would  not  be  easy  to  accomplish,  an  all-payer  database  probably  represents  the  most 
promising  innovation  for  the  profiling  of  medical  practices. 

Researchers  in  medical  informatics  have  been  working  for  decades  to  develop  systems  to 
organize,  standardize,  and  digitize  data  recorded  in  medical  records,  yet  this  vexing 
intellectual  problem  has  not  been  solved  (Dick  and  Steen  1991).  As  a  result,  the  vast 
majority  of  inpatient  and  outpatient  medical  records  still  represent  information  as 
unstructured  free  text  recorded  on  paper.  Since  the  profiling  of  medical  practices  cannot 
await  a  solution  to  this  problem,  profiling  systems  will  have  to  rely  on  manual  abstraction 
to  obtain  access  to  medical  record  data.  The  Health  Care  Financing  Administration  is 
currently  field  testing  a  system  for  the  acquisition  of  inpatient  data  (the  Uniform  Clinical 
Data  Set),  and  an  analogous  system  for  ambulatory  care  data  is  under  development 
(Krakauer  and  Bailey  1991).  If  successful,  these  efforts  should  improve  the  process  of  data 
abstraction,  but  it  is  probably  not  realistic  to  expect  significant  improvements  in  medical 
records  in  the  near  future  that  will  enhance  their  utility  for  profiling. 

Patient  surveys  will  become  a  more  relevant  and  accessible  data  source  in  profiling  systems 
if  they  can  be  linked  to  administrative  data.  Linkage  of  the  Medicare  Current  Beneficiary 
Survey  to  the  five  percent  research  sample  of  Medicare  claims  is  an  example  of  this 
(Executive  Office  of  the  President,  Office  of  Management  and  Budget,  Clearance  Package 
for  the  Medicare  Current  Beneficiary  Survey,  Carolyn  Rimes  (Project  Officer)).  A 
standardized  patient  survey  such  as  a  patient  satisfaction  instrument,  administered  to 
representative  populations  over  time,  would  also  provide  relevant  information  to  be  used 
in  profiling  systems. 

ANALYTICAL  ISSUES 

Profiling  can  be  used  for  descriptive  and  evaluative  purposes.  For  example,  it  can  describe 
the  services  provided  to  treat  acute  cholecystitis  or  chronic  obstructive  pulmonary  disease, 
as  well  as  evaluate  the  appropriateness  of  those  services.  Descriptive  profiling  is  often 


33 


used  to  compare  patterns  of  care  among  providers,  across  geographic  regions,  or  across 
time  periods.  Evaluative  profiling  requires  the  establishment  of  criteria. 

Evaluative  Criteria 

Two  main  classes  of  criteria  have  been  used  in  profiling  to  evaluate  medical  care: 
statistical  norms  and  practice  guidelines.  Statistical  norms  apply  to  populations  as  a  whole. 
They  are  expressed  as  averages,  rates,  or  proportions.  Norms  represent  benchmarks  for 
*  evaluating  overall  patterns  of  utilization  or  outcome;  they  do  not  directly  address  the 
appropriateness  of  individual  services  for  individual  patients.  Practice  guidelines,  in 
contrast,  apply  to  members  of  a  population  individually,  rather  than  in  aggregate. 
Guidelines  specify  which  services  are  appropriate  on  a  case-by-case  basis.5 

Statistical  norms  are  typically  derived  from  observed  practice  based  on  data  taken  from  an 
individual  facility,  a  local  community,  a  region,  or  the  entire  nation.  An  average  length  of 
stay  in  the  hospital,  a  rate  of  postoperative  complications,  or  a  proportion  of  deliveries  by 
cesarean  section  are  examples  of  statistical  norms  that  could  be  used  for  evaluation. 
Norms  may  describe  typical  practice  or  they  maybe  based  on  practices  judged  to  represent 
a  community's  highest  standard  of  care.  An  intervention  designed  to  reduce  hospital  stays 
in  Northwest  Ohio,  for  example,  used  statistical  norms  based  on  national  data  gathered  by 
the  Commission  on  Professional  and  Hospital  Activities  (Kincaid  1984).  This  study 
targeted  physicians  whose  patients  had  lengths  of  stay  averaging  more  than  10  percent 
above  the  national  averages  for  similar  patients,  encouraging  these  physicians  to  bring  their 
practices  closer  to  the  norms. 

An  argument  against  the  use  of  such  historical  averages  is  that  they  can  simply  perpetuate 
a  status  quo  which  includes  some  care  of  questionable  quality.  This  problem  could  be 
overcome  by  establishing  statistical  norms  based  on  the  practices  of  those  providers  whose 
care  has  been  independently  judged  to  represent  the  best  in  the  community  (MeAuliffe 
1978).  As  an  alternative,  norms  could  be  developed  based  on  the  practice  of  physicians 
at  institutions  of  national  distinction  (Shaller  and  Gunderson  1986).6  However,  the  latter 
approach  may  not  be  legitimate  because  such  institutions  typically  treat  select  patient 
populations  and  because  an  institution's  prestige  does  not  assure  that  its  care  is  optimal. 

When  utilization  norms  are  used  in  profiling,  it  is  customary  to  equate  high  utilization  with 
inappropriate  utilization.   This  notion  has  been  challenged  by  studies  that  have  used 


5  The  Institute  of  Medicine  makes  a  distinction  between  practice  guidelines  and  medical  review  criteria, 
considering  the  former  to  be  clinical  decision  aids  and  the  latter  to  be  tools  for  evaluation  (Institute  of  Medicine 
1990).  In  this  paper,  practice  guideline  will  refer  more  loosely  to  any  explicit  set  of  recommendations  for 
appropriate  care  that  can  be  applied  to  individual  patients. 

6  Such  as  the  Mayo  Clinic. 


34 


explicit  clinical  criteria  to  assess  the  appropriateness  of  surgical  procedures  in  regions  with 
varying  rates  of  performing  those  procedures  (Chassin  et  al.  1987;  Leape  et  al.  1990). 
These  studies  found  little  evidence  that  the  procedures  were  performed  less  appropriately 
in  regions  where  the  rates  were  higher.  It  might,  nevertheless,  be  reasonable  to  try  to 
reduce  utilization  where  it  is  above  average,  but  such  efforts  should  be  represented  for 
what  they  are:  cost  containment  strategies.  Also,  care  must  be  taken  to  avoid  strategies 
that  lead  to  the  elimination  of  necessary  care. 

Practice  guidelines  present  explicit  criteria  for  appropriate  care  based  on  expert  opinion 
and  published  research  rather  than  on  observed  practice.  Guidelines  usually  take  the  form 
of  a  straightforward  list  of  recommendations  for  care.  For  example,  the  American 
Diabetes  Association  recommends  that  insulin-treated  diabetes  patients  should  be 
examined  by  a  physician  and  have  laboratory  testing  to  determine  glycemic  control 
quarterly  and  have  an  eye  examination  by  an  ophthalmologist  annually  (American  Diabetes 
Association  1989). 

A  practice  guideline  may  also  take  the  form  of  a  detailed  clinical  algorithm  with  intricate 
branching  logic.  For  example,  a  clinical  algorithm  for  the  triage  of  emergency  room 
patients  with  chest  pain  contains  13  decision  points  based  on  the  patient's  age,  the 
location,  quality,  severity,  continuity,  and  time  since  onset  of  pain,  the  history  of  ischemic 
heart  disease,  and  the  electrocardiographic  findings  (Goldman  et  al.  1988). 

Compared  with  practice  guidelines,  statistical  norms  are  relatively  easy  to  establish  and 
straightforward  to  apply,  because  administrative  data  usually  provide  an  adequate  level  of 
detail.  In  addition,  physicians  may  be  more  prone  to  accept  evaluative  criteria  based  on 
the  practices  of  their  peers  than  on  the  pronouncements  of  expert  panels. 

The  primary  disadvantage  of  statistical  norms  is  that  they  are  poorly  suited  to  evaluating 
appropriateness,  that  is,  to  detecting  errors  of  commission  (provision  of  unnecessary 
services)  or  omission  (failure  to  provide  necessary  services).  A  particular  provider 
committing  many  errors  of  both  types  might  be  overlooked  because  the  two  types  of  errors 
could  cancel,  giving  his  practice  the  appearance  of  normalcy.  A  second  disadvantage  of 
statistical  norms  is  that  they  tend  to  ratify,  rather  than  improve,  the  baseline  quality  of 
care.  By  definition,  statistical  norms  based  on  average  performance  identify  aberrant 
practices  but  do  not  challenge  usual  and  customary  practices. 

Because  practice  guidelines  do  evaluate  appropriateness,  they  have  a  greater  potential  to 
improve  the  baseline  quality  of  care.  The  main  disadvantages  of  practice  guidelines  are 
that  they  have  not  yet  been  developed  by  widely  recognized  authorities  for  many  clinical 
problems,  they  are  both  difficult  and  costly  to  develop  and  maintain,  and  they  require 
detailed  clinical  data  to  be  used  for  evaluation. 


35 


Validity  of  Comparisons 

Profiling  that  compares  the  practices  of  different  providers  is  sometimes  criticized  on 
grounds  that  comparisons  may  not  take  into  account  inter-practice  differences  in  patients' 
severity  of  illness.  Such  differences  may  affect  patients'  need  for  services  and  their  health 
outcomes.  Since  limitations  of  existing  data  sources  make  it  difficult  to  adjust  for  these 
differences,  profiling  results  must  be  interpreted  with  caution. 

Case  Study:  Severity  of  Illness  as  a  Factor  in  Comparisons.  Severity  of  illness  played 
a  subtle  role  in  a  study  comparing  hospital  mortality  in  Boston  and  New  Haven 
(Wennberg  et  al.  1989).  Although  the  death  rate  was  significantly  higher  in  New 
Haven,  the  per  capita  rate  of  hospitalization  proved  to  be  lower  in  New  Haven. 
Less  severely  ill  patients  were  not  as  likely  to  be  hospitalized  in  New  Haven  as 
compared  with  Boston.  As  a  result,  the  average  severity  of  illness  of  patients 
admitted  to  the  hospital  was  higher  in  New  Haven.  Failure  to  recognize  this 
higher  severity  of  illness  might  have  led  to  the  erroneous  inference  of  inferior  care 
in  New  Haven. 

Differences  in  data  recording  practices  is  another  factor  that  could  compromise  the  validity 
of  comparisons  involving  different  providers  or  payers.  For  example,  the  number  of  digits 
of  an  ICD-9-CM  diagnosis  code  that  is  routinely  recorded  on  a  claim  could  vary  from 
payer  to  payer  and  physician  to  physician,  making  comparisons  difficult.  The  organization 
of  medical  records  data,  the  nomenclature  used,  and  the  thoroughness  and  level  of  detail 
of  recorded  data  also  vary  among  providers,  leading  to  the  possibility  of  invalid 
comparisons  unless  compensatory  measures  are  taken. 

Proper  selection  of  a  study  population  is  essential  to  produce  fair  comparisons.  As  a 
general  rule,  a  study  population  should  consist  of  all  individuals  eligible  to  receive  services, 
rather  than  those  who  actually  obtain  services.  Otherwise,  comparisons  could  be  biased 
by  differences  in  the  frequencies  with  which  medical  care  is  sought  in  the  comparison 
groups.  The  previous  case  study  illustrated  this  problem  with  respect  to  a  measure  of 
outcome.  The  next  case  study  shows  how  the  problem  can  confound  evaluations  based  on 
measures  of  service. 

Case  Study:  Selection  of  Study  Population  as  a  Factor  in  Comparisons.  An  analysis 
of  mammography  rates  in  women  over  40  could  produce  misleading  results  if  rates 
were  computed  based  on  the  population  of  women  who  visited  their  physicians 
during  a  given  year.  In  that  case,  a  health  plan  with  a  low  visit  rate  but  a  high 
mammography  rate  among  those  women  who  had  a  visit  would  seem  to  compare 
favorably  with  another  plan  having  a  higher  visit  rate  but  a  lower  mammography 
rate  among  women  who  had  a  visit.  A  higher  proportion  of  women  in  the  second 
plan  might  actually  be  receiving  mammography. 


36 


This  case  study  illustrates  the  problem  of  selection  bias  that  can  occur  when  populations 
are  defined  by  a  particular  data  source  (e.g.,  office  or  hospital  medical  records,  claims 
databases)  instead  of  by  eligibility  for  services  (e.g.,  membership  in  a  health  plan  or 
employer  group)  (Siu  et  al.  1991). 

Attribution  of  Responsibility 

Profiling  may  require  attribution  of  responsibility  for  care  to  specific  individuals  or 
institutions.  This  may  be  straightforward  in  some  cases,  such  as  an  analysis  of  antibiotic 
use  by  primary  care  physicians  in  the  treatment  of  pharyngitis.  In  other  cases,  attribution 
of  responsibility  may  be  problematical.  In  a  study  of  childhood  immunizations,  for 
example,  is  the  child's  physician  or  the  child's  family  responsible  for  adherence  to  the 
recommended  schedule?  In  a  study  of  upper  gastrointestinal  endoscopy  in  patients  with 
dyspepsia,  is  the  patient's  primary  physician  or  the  specialist  performing  the  procedure 
responsible  for  assuring  that  appropriate  indications  are  present? 

It  may  be  difficult  to  assign  responsibility  simply  because  a  patient  is  seen  by  several 
different  providers.  An  analysis  of  medical  care  for  children  with  asthma,  for  example, 
may  consider  the  use  of  emergency  services  to  be  an  indicator  of  adverse  outcome.  If  a 
child  receives  ongoing  care  from  both  a  pediatrician  and  an  allergist,  it  may  not  be  clear 
which  physician  should  be  considered  responsible.  Similarly,  it  may  be  unclear  which 
physician  should  receive  credit  for  a  screening  mammogram  obtained  by  a  patient  routinely 
seen  by  an  internist  as  well  as  a  gynecologist. 

When  analyzing  claims  data,  arbitrary  rules  are  usually  applied,  typically  assigning 
responsibility  to  the  provider  with  the  most  visits  from  a  given  patient.  This  may  be  a 
reasonable  approach,  but  it  is  important  to  make  such  assumptions  explicit  when  reporting 
results.  Decision  rules  for  attribution  of  responsibility  are  necessary  to  produce 
meaningful  comparisons  when  profiling  is  used  to  assess  quality  or  efficiency  of  care. 

Sample  Size  Considerations 

Most  profiling  studies  involve  comparisons.  Meaningful  inferences  from  comparative 
studies  require  a  demonstration  that  observed  differences  are  statistically  significant,  that 
they  cannot  reasonably  be  attributed  to  random  variation  (Fleiss  1981;  Freeman  1987). 
Demonstrating  statistical  significance  is  essential  whether  a  study  involves  description  (e.g., 
geographic  variations  in  practice  patterns)  or  evaluation  (e.g.,  individual  physicians' 
utilization  of  services  compared  with  statistical  norms  or  practice  guidelines). 

Such  demonstrations  require  adequate  sample  sizes.  In  some  cases  obtaining  an  adequate 
sample  is  simply  a  matter  of  gaining  access  to  a  sufficiently  large  database.  A  previously 
cited  study  of  the  relationship  between  the  volume  of  surgical  procedures  and 
postoperative  mortality,  for  example,  relied  on  Medicare  claims  files  containing  records 


37 


on  millions  of  beneficiaries  (Luft  et  al.  1979).  In  other  cases,  however,  a  sufficiently  large 
sample  for  statistical  purposes  may  be  inherently  unattainable.  Efforts  to  profile  individual 
physicians'  practices  may  fall  into  this  category,  especially  when  focusing  on  specific 
procedures  or  diagnoses.  The  individual  physician  simply  does  not  see  enough  patients 
(Stafford  1990). 

Case  Study:  Evaluation  Based  on  Small  Sample.  Suppose  that  a  statistical  norm 
for  delivery  by  cesarean  section  were  set  at  150  per  1000  births,  based  on  a  "highest 
standard  of  care"  criterion.  In  other  words,  obstetricians  acknowledged  to  provide 
the  best  quality  care  perform  C-sections  in  about  15  percent  of  their  deliveries. 
Suppose,  furthermore,  that  a  particular  obstetrician  performed  120  deliveries  in  a 
year,  24  of  which  were  C-sections.  His  20  percent  rate  of  C-sections  exceeds  the 
norm  by  one-third.  Nevertheless,  the  norm  falls  within  the  95  percent  confidence 
limits  (12.8,  27.2)  of  the  obstetrician's  C-section  rate.  Since  the  observed  rate  does 
not  differ  statistically  from  the  norm  at  a  0.05  significance  level,  it  would  not  be 
legitimate  to  pass  judgment  unfavorably  on  this  obstetrician's  practice. 

The  illustration  in  the  above  case  study  is  especially  revealing  because  cesarean  section  is 
the  most  common  major  surgical  procedure  performed  in  U.S.  hospitals  (Stafford  1990). 
Making  inferences  about  appropriateness  would  be  that  much  more  difficult  for  a  less 
common  procedure,  or  for  incomplete  samples  of  a  physician's  practice  obtained  from  a 
single  payer's  administrative  database. 

Even  large  samples  do  not  guarantee  that  observed  differences  are  statistically  significant. 
A  recent  critique  of  the  statistical  methods  used  in  studies  of  geographic  variation,  for 
example,  noted  that  these  studies  have  almost  always  reported  a  large  difference  between 
the  highest  and  lowest  utilization  rates,  implying  some  underuse  or  overuse  of  services 
(Diehr  1984).  The  author  challenges  the  implication,  however,  pointing  out  that  "...the 
highest  rate  is  always,  by  definition,  higher  than  the  lowest  rate,  and  the  differences  can 
be  surprisingly  large  by  chance  alone."  Appropriate  statistical  methods  must  be  used  to 
determine  whether  differences  are,  in  fact,  significant  (Willemain  1982). 

CONCLUSIONS 

While  profiling  has  the  potential  to  improve  the  management  of  medical  care,  significant 
data  problems  must  be  addressed  before  meaningful  profiling  results  will  be  available  on 
a  broad  scale.  The  present  paper  has  discussed  data  quality  and  analytical  issues  that  pose 
threats  to  the  validity  of  profiling  analyses. 

Practical  considerations  dictate  that  administrative  data  will  serve  as  the  cornerstone  of 
any  profiling  system  that  encompasses  a  significant  portion  of  medical  care  in  the  United 
States.  The  improvement  of  administrative  data  for  the  needs  of  profiling  should  therefore 


38 


be  considered  a  high  priority  task.  Concrete  steps  should  be  undertaken  to  standardize 
coding  systems  and  data  structures  so  that  it  will  be  possible  to  combine  data  from  private 
and  public  payers.  Concerted  attention  should  be  paid  to  resolving  other  obstacles  to  data 
linkage  including  the  development  of  policies  and  procedures  to  address  data  privacy  and 
proprietary  ownership  concerns. 

Research  on  claims  data  should  be  conducted.  All  data  sources  have  error  which  must  be 
measured,  understood,  and  considered  in  the  analysis  and  interpretation  of  findings. 
Unlike  many  other  data  sources,  claims  data  currently  lack  a  body  of  literature  to  guide 
this  work.  The  accuracy  and  completeness  of  the  capture  of  diagnostic  information,  the 
development  of  valid  methods  to  create  denominator  populations,  and  the  specificity  of 
procedure  coding  are  some  of  the  methodologic  issues  to  be  addressed. 

The  Agency  for  Health  Care  Policy  and  Research  is  currently  sponsoring  a  project  which 
will  compare  Medicare  claims  and  medical  records  as  sources  of  data  for  assessing  the 
quality  of  ambulatory  care  (Heather  Palmer,  principal  investigator,  Grant  Number  1  R01 
HS06469).  Data  from  random  samples  of  office-based  medical  records  will  be  used  to 
assess  the  validity  of  quality  review  based  on  claims.  This  is  an  example  of  the  type  of 
methodologic  work  that  is  needed  to  elucidate  the  strengths  and  weaknesses  of  available 
data  sources.  Considerably  more  work  is  needed  in  order  to  maximize  the  advantages  and 
minimize  the  shortcomings  of  claims  data  in  physician  profiling  systems. 

Finally,  the  feasibility  and  utility  of  including  data  from  other  sources  such  as  medical 
records,  patient  surveys,  and  special  purpose  instruments  should  be  carefully  studied. 
These  additional  sources  describe  important  aspects  of  care  that  cannot  be  elucidated  from 
claims  data.  Pilot  studies  aimed  at  linking  these  data  sources  with  administrative  data,  as 
well  as  studies  that  consider  the  development  of  new  data  sources,  should  be  undertaken. 

The  seriousness  of  the  health  care  crisis  in  the  United  States  demands  the  development 
and  proper  application  of  informational  tools  to  monitor  and  influence  the  practice  of 
medicine.  Improving  data  accuracy,  reducing  sources  of  bias,  and  linking  and  pooling  data 
from  diverse  sources  will  permit  global  assessments  and  scientifically  defensible  conclusions 
from  profiling  that  are  not  presently  attainable.  If  successful,  these  efforts  would  make 
possible  the  implementation  of  statistical  quality  control  systems  to  manage  and  improve 
medical  care  so  that  those  receiving  and  financing  the  care  can  enjoy  the  greatest  possible 
benefit. 

Acknowledgements:  We  would  like  to  thank  Gerald  Lutgen  for  his  insights  during  the 
formative  stages  of  this  paper. 


39 


REFERENCES 

American  Diabetes  Association,  "Standards  of  Medical  Care  for  Patients  With  Diabetes 
Mellitus,"  Diabetes  Care  12(5):365-368,  1989. 

American  Medical  Association,  Physicians'  Current  Procedural  Terminology  (Chicago: 
American  Medical  Association,  1991). 

Berenson,  R.  and  J.  Holahan,  "Sources  of  Growth  in  Medicare  Physician  Expenditures," 
Journal  of  the  American  Medical  Association,  in  press,  1992. 

Berenson,  R.  and  J.  Holahan,  Using  a  New  Type  of  Service  Classification  System  to  Analyze 
the  Growth  in  Medicare  Physician  Expenditures,  1985-1988,  Working  Paper  3983-01 
(Washington,  DC:  Urban  Institute,  1990). 

Berwick,  D.M:,  "The  Double  Edge  of  Knowledge,"  Journal  of  the  American  Medical 
Association  266:841-842,  1991. 

Berwick,  D.M.,  "Continuous  Improvement  as  an  Ideal  in  Health  Care,"  The  New  England 
Journal  of  Medicine  320:53-56,  1989. 

Bindman,  A.B.,  D.  Keane,  and  N.  Lurie,  "Measuring  Health  Changes  Among  Severely  111 
Patients:  The  Floor  Phenomenon,"  Medical  Care  28:1142-1152,  1990. 

Brand,  DA.,  D.  Acampora,  L.D.  Gottlieb,  et  al,  "Adequacy  of  Antitetanus  Prophylaxis  in 
Six  Hospital  Emergency  Rooms,"  The  New  England  Journal  of  Medicine  309:636-640, 
1983. 

Chassin,  M.R.,  J.  Kosecoff,  CM.  Winslow,  et  al.,  "Does  Inappropriate  Use  Explain 
Geographic  Variations  in  the  Use  of  Health  Care  Services?  A  Study  of  Three 
Procedures,"  Journal  of  the  American  Medical  Association  258:2533-2537,  1987. 

Commission  on  Professional  and  Hospital  Activities,  International  Classification  of  Diseases, 
Ninth  Revision,  Clinical  Modification  (Ann  Arbor:  Edwards  Brothers  Inc.,  1980). 

Darby,  M.,  "Patient  Information  Used  to  Measure  Quality,"  Report  on  Medical  Guidelines 
and  Outcomes  Research  2(14):5,  1991. 

Davies,  A.R.  and  J.E.  Ware,  GHAA's  Consumer  Satisfaction  Survey  and  User's  Manual,  2d 
ed.  (Washington,  DC:  Group  Health  Association  of  America,  Inc.,  1991). 


40 


Davies,  A.R.  and  J.E.  Ware,  "Involving  Consumers  in  Quality  of  Care  Assessment,"  Health 
Affairs  7(l):33-48,  Spring  1988. 


DCCT  Research  Group,  "Reliability  and  Validity  of  a  Diabetes  Quality-of-life  Measure  for 
the  Diabetes  Control  and  Complications  Trial  (DCCT),"  Diabetes  Care  11:725-732, 
1988. 

Demlo,  L.K.,  P.M.  Campbell,  and  S.S.  Brown,  "Reliability  of  Information  Abstracted  from 
Patient's  Medical  Records,"  Medical  Care  16:995-1005,  1978. 

Dick,  R.S.  and  E.B.  Steen,  The  Computer-Based  Patient  Record:  An  Essential  Technology 
for  Health  Care  (Washington,  DC:  National  Academy  Press,  1991). 

Diehr,  P.,  "Small  Area  Statistics:  Large  Statistical  Problem,"  American  Journal  of  Public 
Health  74:313-314,  1984. 

Feinstein,  A.R.,  "ICD,  POR,  and  DRG:  Unsolved  Scientific  Problems  in  the  Nosology  of 
Clinical  Medicine ,"  Archives  of  Internal  Medicine  148:2269-2274,  1988. 

Fleiss,  J.I.,  Statistical  Methods  for  Rates  and  Proportions  (New  York:  John  Wiley,  1981). 

Frazier,  W.H.  and  DA.  Brand,  "Quality  Assessment  and  the  Art  of  Medicine:  The 
Anatomy  of  Laceration  Care,"  Medical  Care  17:480-490,  1979. 

Freeman,  D.,  Applied  Categorical  Data  Analysis  (New  York:  Marcel  Dekker,  1987). 

Goldman,  L.E.,  F.  Cook,  DA.  Brand,  et  al.,  "A  Computer  Protocol  to  Predict  Myocardial 
Infarction  in  Emergency  Department  Patients  with  Chest  Pain,"  The  New  England 
Journal  of  Medicine  318:797-803,  1988. 

Goyert,  G.L.,  S.F.  Bottoms,  M.C.  Treadwell,  et  al.,  "The  Physician  Factor  in  Cesarean 
Birth  Rates,"  The  New  England  Journal  of  Medicine  320:706-709,  1989. 

Halvorson,  G.C.,  D.J.  Mangen,  and  K.M.  Cooney,  "The  Myth  of  Micro-Data,"  HMO 
Practice  5:178-182,  1991. 

Harlow,  S.D.  and  M.S.  Linet,  "Agreement  between  Questionnaire  Data  and  Medical 
Records:  The  Evidence  for  Accuracy  of  Recall,"  American  Journal  of  Epidemiology 
129:233-247,  1989. 

Hendrickson,  L.  and  J.  Myers,  "Some  Sources  and  Potential  Consequences  of  Errors  in 
Medical  Data  Recording,"  Methods  of  Information  in  Medicine  12:38-45,  1973. 


41 


Henke,  C.J.  and  W.V.  Epstein,  "Practice  Variation  in  Rheumatologists'  Encounters  with 
Their  Patients  Who  Have  Rheumatoid  Arthritis,"  Medical  Care  29:799-812,  1991. 

Hillman,  B.J.,  C.A.  Joseph,  M.R.  Mabry,  et  al.,  "Frequency  and  Costs  of  Diagnostic 
Imaging  in  Office  Practice:  A  Comparison  of  Self-Referring  and 
Radiologist-Referring  Physicians,"  The  New  England  Journal  of  Medicine 
323:1604-1608,  1990. 

Hornbrook,  M.C.,  A.V.  Hurtado,  and  R.E.  Johnson,  "Health  Care  Episodes:  Definition, 
Measurement  and  Use,"  Medical  Care  Review  42:163-218,  1985. 

Institute  of  Medicine,  Clinical  Practice  Guidelines:  Directions  for  a  New  Program 
(Washington,  DC:  National  Academy  Press,  1990). 

Javitt,  J.C.,  J.  K.  Canner,  M.  M.  Kolb,  et  al.,  "Outcomes  of  Cataract  Extraction:  Analysis 
of  Surgical  Complications  Using  Medicare  Claims  Data,"  presented  at  the  Thirteenth 
Annual  Meeting  of  the  Society  for  Medical  Decision  Making,  October,  1991. 

Jencks,  S.F.,  D.K.  Williams,  and  T.L.  Kay,  "Assessing  Hospital-Associated  Deaths  from 
Discharge  Data:  The  Role  of  Length  of  Stay  and  Comorbidities,"  Journal  of  the 
American  Medical  Association  260:2240-2246,  1988. 

Johnson,  J.  A.  and  G.  Mosser,  "A  Patient  Satisfaction  Survey  of  FFS  and  HMO 
Hospitalized  Patients,"  GHAA  Journal  8:53-62,  1988. 

Kerr,  C.E.,  "Seeking  Employee  Perceptions  of  Quality  of  Health  Care,"  Quality  Review 
Bulletin  15:198-202,  1989. 

Kincaid,  W.H.,  "Changing  Physician  Behavior:  The  Peer  Data  Method,"  Quality  Review 
Bulletin  10:238-242,  1984. 

Krakauer,  H.  and  R.C.  Bailey,  "Epidemiologic  Oversight  of  the  Medical  Care  Provided  to 
Medicare  Beneficiaries,"  Statistics  in  Medicine  10:521-540,  1991. 

Leape,  L.L.,  R.E.  Park,  D.H.  Solomon,  et  al.,"Does  Inappropriate  Use  Explain  Small  Area 
Variations  in  the  Use  of  Health  Care  Services?"  Journal  of  the  American  Medical 
Association  263:669-672,  1990. 

Luft,  H.S.,  J.P.  Bunker,  and  A.C.  Enthoven,  "Should  Operations  Be  Regionalized?  The 
Empirical  Relation  Between  Surgical  Volume  and  Mortality,"  The  New  England 
Journal  of  Medicine  301:1364-1369,  1979. 


42 


Mandelblatt,  J.,  H.  Andrews,  J.  Kerner,  et  al.,  "Determinants  of  Late  Stage  Diagnosis  of 
Breast  and  Cervical  Cancer:  The  Impact  of  Age,  Race,  Social  Class  and  Hospital 
Type,"  American  Journal  of  Public  Health  81:646-649,  1991. 

McAuliffe,  W.E.,  "On  the  Statistical  Validity  of  Standards  Used  in  Profile  Monitoring  of 
Health  Care,"  American  Journal  of  Public  Health  68:645-651,  1978. 

National  Center  for  Health  Statistics,  The  National  Ambulatory  Medical  Care  Survey: 
Background  and  Methodology,  DHEW  Publication  No.  (HRA)76-1335,  (Washington, 
DC:  DHEW,  1974). 

Noren,  J.,  T.  Frazier,  I.  Altman,  et  al.,  "Ambulatory  Medical  Care:  A  Comparison  of 
Internists  and  Family-General  Practitioners,"  The  New  England  Journal  of  Medicine 
302:11-16,  1980. 

Notzon,  F.C.,  P.J.  Placek,  and  S.M.  Taffel,  "Comparisons  of  National  Cesarean  Section 
Rates,"  The  New  England  Journal  of  Medicine  316:386-389,  1987. 

Paganini-Hill,  A.  and  R.K.  Ross,  "Reliability  of  Recall  of  Drug  Usage  and  Other 
Health-Related  Information,"  American  Journal  of  Epidemiology  116:114-122,  1982. 

Physician  Payment  Review  Commission,^  wiwa/ .Report  to  Congress  1991  (Washington,  DC: 
PPRC,  1991). 

Ray,  W.A.  and  M.R.  Griffin,  "Use  of  Medicaid  Data  for  Pharmacoepidemiology.Mraertcan 
Journal  of  Epidemiology  129:837-849,  1989. 

Romm,  F.J.  and  S.M.  Putnam,  "The  Validity  of  the  Medical  Record,"  Medical  Care 
19:310-315,  1981. 

Roos,  L.L.,  S.M.  Cageorge,  E.  Austen,  et  al.,  "Using  Computers  to  Identify  Complications 
After  Surgery,"  American  Journal  of  Public  Health  75:1288-1295,  1985. 

Roos,  N.P.,  L.L.  Roos  Jr.,  and  P.D.  Henteleff,  "Elective  Surgical  Rates:  Do  High  Rates 
Mean  Lower  Standards?  Tonsillectomy  and  Adenoidectomy  in  Manitoba,"  The  New 
England  Journal  of  Medicine  297:360-365,  1977. 

Roper,  W.L.,  W.  Winkenwerder,  G.M.  Hackbarth,  et  al,  "Effectiveness  in  Health  Care: 
An  Initiative  to  Evaluate  and  Improve  Medical  Practice,"  The  New  England  Journal 
of  Medicine  319:1197-1202,  1988. 


43 


Salkever,  D.S.,  P.S.  German,  S.M.  Shapiro,  et  al.,  "Episodes  of  Illness  and  Access  to  Care 
in  the  Inner  City:  A  Comparison  of  HMO  and  Non-HMO  Populations,"  Health 
Services  Research  ll(3):252-270,  1976. 

Schlackman,  N.,  "Integrating  Quality  Assessment  and  Physician  Incentive  Payment,"  Quality 
Review  Bulletin  15:234-237,  1989. 

Shaller,  D.  and  S.  Gunderson,  "Setting  Benchmarks  for  Cost-Effective  Care,"  Business  and 
Health  3(10):28-32,  October  1986. 

Simborg,  D.W.,  "DRG  Creep:  A  New  Hospital-Acquired  Disease,"  The  New  England 
Journal  of  Medicine  304:1602-1604,  1981. 

Simon,  J.L.,  Basic  Research  Methods  in  Social  Sciences:  The  Art  of  Empirical  Investigation, 
2d  ed.  (New  York:  Random  House,  1978). 

Siu,  A.L.,  E.A.  McGlynn,  H.  Morgenstern,  et  al.,  "A  Fair  Approach  to  Comparing  Quality 
of  Care,"  Health  Affairs  10(l):62-75,  Spring  1991. 

Stafford,  R.S.,  "Cesarean  Section  Use  and  Source  of  Payment:  An  Analysis  of  California 
Hospital  Discharge  Abstracts,"  American  Journal  of  Public  Health  80:313-315,  1990. 

Steinberg,  E.P.,  J.  Whittle,  and  G.F.  Anderson,  "Impact  of  Claims  Data  Research  on 
Clinical  Practice,"  International  Journal  of  Technology  Assessment  in  Health  Care 
6:282-287,  1990. 

Studney,  D.R.  and  R.  Hakstian,  "A  Comparison  of  Medical  Record  with  Billing  Diagnostic 
Information  Associated  with  Ambulatory  Medical  Care,"  American  Journal  of  Public 
Health  71:145-149,  1981. 

Taffel,  S.M.,  P.J.  Placek,  and  T.  Liss,  "Trends  in  the  United  States  Cesarean  Section  Rate 
and  Reasons  for  the  1980-85  Rise,"  American  Journal  of  Public  Health  77:955-959, 
1987. 

Thomas,  J.W.  and  J.J.  Holloway,  "Investigating  Early  Readmission  as  an  Indicator  for 
Quality  of  Care  Studies,"  Medical  Care  29:377-394,  1991. 

Ware,  J.E.,  "Conceptualizing  and  Measuring  Generic  Health  Outcomes,"  Cancer 
61:114-119,  1991. 

Ware,  J.E.  and  A.R.  Davies,  "Patient's  Perspectives  on  the  Quality  of  Medical  Care," 
Journal  of  Family  Practice  26:489-490,  1988. 


44 


Ware,  J.E.  and  R.D.  Hays,  "Methods  for  Measuring  Patient  Satisfaction  with  Specific 
Medical  Encounters,"  Medical  Care  26:393-402,  1989. 

Wennberg,  J.E.,  J.L.  Freeman,  R.M.  Shelton,  et  al.,  "Hospital  Use  and  Mortality  Among 
Medicare  Beneficiaries  in  Boston  and  New  Haven,"  The  New  England  Journal  of 
Medicine  321:1168-1173,  1989. 

Willemain,  T.  R.,  "On  the  Comparison  of  Highest  and  Lowest  Surgery  Rates  in  Small-Area 
Studies,"  in  I.  Rothberg  ed.,  Regional  Variation  in  Hospital  Use  (Lexington:  Lexington 
Books,  1982). 

Williams,  S.V.,  D.B.  Nash,  and  N.  Goldfarb,  "Differences  in  Mortality  from  Coronary 
Artery  Bypass  Graft  Surgery  at  Five  Teaching  Hospitals,"  Journal  of  the  American 
Medical  Association  266:810-815,  1991. 

Wolfe,  F.,  S.M.  Kleinheksel,  MA.  Cathey,  et  al.,  "The  Clinical  Value  of  the  Stanford 
Health  Assessment  Questionnaire  Functional  Disability  Index  in  Patients  with 
Rheumatoid  Arthritis ,"  Journal  of  Rheumatology  15:1480-1488,  1988. 


45 


PAPER  NO.  3 


CURRENT  ISSUES  IN  PROFILES:  POTENTIALS 
AND  LIMITATIONS 


Authors: 

Barbara  J.  McNeil,  M.D.,  Ph.D. 
Sarah  H.  Pedersen,  M.B.A. 
Constantine  Gatsonis,  Ph.D. 


Address: 

Department  of  Health  Care  Policy 
Harvard  Medical  School 
25  Shattuck  Street,  Parcel  B 
Boston,  MA  02115 


CURRENT  ISSUES  IN  PROFILES: 
POTENTIALS  AND  LIMITATIONS 


Because  of  increasing  evidence  that  all  physicians  do  not  provide  the  same  process  of  care 
and  quality  of  care  for  their  patients,  there  has  been  interest  in  understanding  more  about 
how  processes  of  care  differ.  On  the  research  side,  this  interest  has  translated  into 
projects  like  the  Patient  Outcome  Research  Teams  (PORTs)  which  attempt  to  understand 
the  relationship  between  different  processes  of  care  and  outcomes  of  care.  On  the 
management  side,  the  interest  is  reflected  by  activities  that  document  the  extent  to  which 
practice  patterns  differ  and  then  explore  the  financial  implications  of  these.  Research 
tends  to  focus  on  issues  like:  "Does  an  ICU  stay  of  3  days  instead  of  4  days  lead  to  the 
same  outcomes  for  critically  ill  surgical  patients  and  at  what  cost?"  or  "Should  all  patients 
or  should  some  patients  have  cardiac  catheterization  after  an  acute  myocardial  patients?" 
Management  focuses  on  issues  like:  "Is  doctor  X  better  than  doctor  Y  or  "Should  we  hire 
doctor  Y  for  our  Preferred  Provider  Organization  (PPO)  network?" 

Answers  to  these  questions  have  been  facilitated  by  several  factors.  First,  continuing  cost 
pressures  and  the  introduction  of  the  Prospective  Payment  System  in  1983  gave  hospitals 
the  need  and  opportunity  to  monitor  the  resource  utilization  of  their  providers  in  relation 
to  Diagnosis- Related  Group  (DRG)  payments  for  Medicare  beneficiaries.  Second,  the 
need  for  hospitals  to  negotiate  prices  with  Health  Maintenance  Organizations  (HMOs)  and 
PPOs  led  to  scrutiny  about  the  accuracy  of  utilization  profiles  for  patients  of  all  ages. 
Third,  marketing  pressures  and  the  Continuous  Quality  Improvement  movement  have  led 
to  increased  efforts  at  evaluating  processes  and  in  quantitating  outcomes.  And  finally, 
active  work  by  health  service  researchers  has  led  to  more  refined  approaches  for  data 
collection,  analyses  and  interpretation  of  data  on  process  of  care  and  outcomes  of  care. 

This  paper  will  attempt  to  provide  an  overview  of  activities  involving  physician  profiles  as 
they  are  now  being  done  by  a  variety  of  groups  in  this  country.  It  will  then  discuss  a  series 
of  issues  related  to  data  collection,  analysis,  categorization  of  physicians  and  standards  of 
comparison  in  at  attempt  to  identify  the  limits  of  such  profiling  activities.  It  will  indicate 
what  further  kinds  of  research  activities  need  to  be  undertaken  to  enrich  the  use  of 
profiles  and  what  kinds  of  profiles  at  the  moment  are  most  appropriate. 


47 


CURRENT  ACTIVITIES 


Background  and  Applications 

Comparative  performance  measures  of  health  care  institutions  were  proposed  by  E.A. 
Codman  in  1916  (Codman  1916)  and  begun  nationally  by  the  Health  Care  Financing 
Administration  (HCFA)  in  1987  and  by  some  states  (e.g.,  California)  even  earlier.  Their 
use  has  been  associated  with  considerable  controversy,  illustrated  best  by  a  recent  NY  state 
case  in  which  the  state  ordered  the  health  department  to  go  beyond  the  release  of  hospital- 
Specific  mortality  rates  and  to  release  physician-specific  mortality  rates  for  patients  having 
by-pass  surgery  (Vibbert  1991).  The  Health  Department  of  New  York  has  been  concerned 
that  the  "data  could  be  misunderstood  and  misused  by  the  public"  (Vibbert  1991).  Sidney 
Wolfe,  director  of  the  Health  Research  Group,  a  Washington-based  consumer 
organization,  on  the  other  hand,  has  said:  "There's  no  question  in  my  mind  this  would 
cause  immediate  changes  in  the  surveillance  of  physicians  with  high  mortality  rates."  He 
called  the  state's  severity  adjustments  "virtually  unassailable"  (Vibbert  1991). 

Physician  profiling  is  conducted  by  a  wide  range  of  institutions  for  a  variety  of  purposes. 
It  is  a  term  which  is  used  loosely,  with  many  different  meanings  associated  with  it.  For 
the  purposes  of  this  paper,  it  will  mean  the  collection,  analysis,  and  use  of  data  pertaining 
to  physician  practice  patterns.  Profiles  have  been  developed  to  assist  with: 

•  Provider  feedback  and  education  ~  Usually  these  reports  are  prepared  with  the 
expectation  that  they  will  improve  quality  and  with  the  hope  that  high  utilization 
providers  will  modify  their  behavior.  This  paper  will  focus  on  this  use. 

•  Physician  selection  -  This  occurs  in  three  ways:  by  patients  in  their  search  for  a 
physician;  by  physicians  in  their  identification  of  physicians  to  whom  to  refer 
patients;  and  by  managed  care  groups  in  their  selection  of  physicians. 

•  Physician  compensation  -  These  reports  are  used  by  managed  care  organizations 
and  proprietary  chains,  such  as  HealthStop,  to  help  determine  salary  increases  for 
physicians  or  withholds  for  physicians  in  independent  practice  associations. 

•  Provider  marketing  —  Profiling  tools  exist  that  allow  physicians  and  hospitals  to 
identify  marketing  opportunities.  For  example,  software  packages  are  being 
offered  by  organizations  such  as  the  Medical  Group  Management  Association  that 
allow  physicians  to  compare  their  charge  and  utilization  patterns  to  those  of  their 
peers  to  identify  areas  for  income  enhancement.  At  least  one  statewide  database 
is  also  being  sold  to  enable  hospitals  to  research  the  marketing  potential  of  new 
physician  services. 

•  Punitive  sanctions  --  These  types  of  reports  tend  to  be  generated  only  when  prior 
educational  interventions  have  failed. 


48 


Measures  Used  in  Profiles  to  Date 


Despite  the  enormous  range  of  physician  profiling  reports,  it  is  possible  to  categorize  them 
according  to  whether  they  are  thought  to  measure  predominantly  utilization  or 
predominantly  quality.  The  distinction  between  utilization  and  quality  is  frequently 
blurred,  however,  particularly  in  instances  when  utilization  rates  are  used  to  measure 
quality.  The  degree  of  sophistication  of  profiles  varies  tremendously. 

Utilization.  Gross  utilization  measures  are  the  basis  of  most  profiling  reports.  Examined 
on  the  ambulatory  side  are  items  such  as  charges/subscriber,  charges/visit  (or  other  time 
period),  number  of  tests  or  procedures/visit  (or  other  time  period),  numbers  of  referrals 
to  other  primary  or  specialty  providers,  numbers  of  patient-generated  visits  to  additional 
primary  care  providers,  numbers  of  visits  outside  of  an  agreed-upon  bundle  (e.g.,  as  in 
obstetrical  care),  percent  of  brief,  intermediate  and  extended  visits  per  provider,  etc.  On 
the  inpatient  side,  gross  measures  are  days/subscriber,  charges/day,  charges/case,  and 
length  of  stay  (LOS).  More  refined  reports  of  utilization  subdivisions  by  specialty,  by 
department  or  by  DRG  or  ICD-9  codes  occur  with  breakdowns  for  ambulatory  and 
inpatient  activity  when  appropriate.  On  the  ambulatory  side,  variables  such  as 
tonsillectomy  and  cataract  surgery  rates,  number  of  endoscopy  claims  per  internist,  and 
number  of  radiologic  examinations  per  family  practitioner  have  been  examined.  On  the 
inpatient  side,  common  measures  are  hysterectomy  and  cesarean  section  rates,  length  of 
stay  for  an  uncomplicated  delivery,  and  percentage  of  elective  surgical  admissions  with 
more  than  one  pre-operative  day. 

Quality.  In  general,  profiling  reports  on  quality  tend  to  be  more  in  the  research  and 
development  mode  than  are  their  resource  counterparts.  Conceptually,  although  quality 
measures  can  include  a  variety  of  process  and  outcome  characteristics,  most  of  them  have 
focused  on  explicit  process  issues  (Table  3-1)  (Steinwachs  et  al.  1989).  The  use  of  implicit 
process  measures  which  monitor  the  occurrence  of  specified  events  (e.g.,  "clearly  meets  the 
contemporary  national  standards  of  general  internal  medicine")  has  been  proposed 
(Sanazaro  and  Worth  1985);  however,  these  have  been  confined  to  settings  in  which 
medical  records  or  their  abstracts  can  be  routinely  reviewed. 

On  the  ambulatory  side,  process  measures  tend  to  focus  upon  compliance  with  accepted 
standards  for  the  care  of  chronic  conditions,  preventive  services  and  screening.  Frequently 
these  have  involved  examining  whether  the  recommended  number  of  tests,  procedures  or 
referrals  occurred  during  a  given  time  period;  for  example,  whether  diabetic  patients  have 
at  least  one  annual  glucose  test  and  ophthalmologic  visit  (McCoy  et  al.  1992);  and  whether 
recommended  influenza  shots,  Pap  smears,  and  mammograms  occurred  for  target 
populations.  Also  measured  in  ambulatory  care  are  hospitalization  or  complication  rates 
after  ambulatory  procedures,  inappropriate  emergency  room  usage,  and  medications 
prescribed  despite  contraindications. 


49 


Table  3-1.    Monitoring  Quality  of  Care  Based  on  Measures  Derived  from  a  Medical 


Information  System 


Quality  of 
Care  Issues 

Measures 

Example 

Criteria 

Access 

Proportion  of  population 
receiving  care  during 
the  year,  classified  by 
age  and  sex 

Percent  of  children  under 
age  2  seen  for  at  least  one 
well-care  visit 

National 

Percent  of  children  seen  in 
emergency  rooms  for  any 
reason,  for  trauma,  and  for 
medical  problems. 

Trends 

Preventive  Proportion  of  population 

in  specific  age  and  sex 
groups  receiving 
recommended  tests  or 
procedures 


Percent  of  children  by 
group  having 
recommended 
immunizations  in  previous 
year 

Percent  of  women  age  50 
and  over  having 
mammography  in  past 
year 


National 

recommendation 


National 

recommendation 


Diagnosis 


Treatment 


Outcomes 


%  of  population  diagnosed 
(and  under  care)  for  specific 
chronic  conditions  by  age 
and  sex 


Medications: 

Average  number  of  new 

prescriptions  per  person  per 

year 

Surgery: 

Rate  of  surgical  procedures 
per  year;  total,  inpatient, 
and  ambulatory  (if 
applicable) 


Hospital  readmissions 
within  3  months  of 
discharge 


Percent  of  deliveries  with 
prenatal  care  beginning  in 
first  trimester 

Percent  of  adults 
diagnosed  at  one  or  more 
visits  as  having  essential 
hypertension  by  age  and 
sex 


Average  number  of  new 
prescriptions  for  antibiotics 
per  person  per  year 


Cesarean  section  rate  for 
all  deliveries 


Percent  of  readmissions  for 
same  condition 

Percent  of  readmissions 
identifying  a  complication 


National 

recommendation 


Epidemiologic 
data  on  prevalence 
of  hypertension 


Trends  and 
comparison  data 


Trends  and 
comparison  data 


Comparison  data  and 
trends 


Source:   Reproduced  in  part  from:  Steinwachs,  D.M.  et  al.  Management  Information  Systems  and  Quality. 

In:  N.  Goldfield  and  D.B.  Nash.  Providing  Quality  Care:  The  Challenge  to  Clinicians.  Philadelphia: 
American  College  of  Physicians,  1989,  Chapter  6,  160-180. 


On  the  inpatient  side,  the  more  basic  reports  examine  mortality,  morbidity,  and 
readmission  rates  per  physician.  The  more  sophisticated  reports,  such  as  those  used  by 
one  large  hospital  chain,  measure  items  such  as:  mortality  and/or  complication  rates  for 
catheterizations,    arteriograms,    hip    replacements,    knee    replacements,  fusions, 


50 


endarterectomies,  prostatectomies,  GI  endoscopy  biopsy  rates,  non-documented  (as  to 
rationale)  hysterectomies  and  cesarean  sections,  birth  traumas,  and  ventilator  usage  in 
special  care  units.  One  Peer  Review  Organization  (PRO)  is  attempting  to  identify 
outcome  measures  that  might  be  available  from  inpatient  billing  data  and  that  might 
contribute  to  a  composite  index  of  quality  of  care  (Alsip  1990). 

Production  of  Reports 

The  production  of  reports  can  be  considered  from  at  least  two  perspectives.  How  are  the 
reports  generated?  And,  what  databases  are  utilized  to  create  them? 

Generation  of  Reports.  The  site  and  timing  of  report  production  vary  from  being  done  on 
a  decentralized,  regional  basis  to  an  entirely  centralized  process.  One  hospital  chain  has 
an  intermediate  approach:  it  produces  all  reports  centrally  and  then  distributes  them  to 
individual  hospitals  for  analysis.  The  speed  with  which  this  occurs  ranges  from  quarterly 
at  one  chain,  to  a  few  months  after  the  close  of  the  fiscal  year  at  another,  to  even  longer 
for  some  research  projects. 

A  number  of  the  organizations  surveyed  have  adopted  a  "nesting"  approach  to  report 
generation.  Initial  runs  are  aimed  at  providing  the  users  with  a  broad  overview. 
Subsequent  reports  allow  them  to  investigate  potential  problems  in  greater  detail.  One 
insurer,  for  example,  will  first  produce  reports  on  rates  for  a  given  geographic  area,  then 
subdivide  this  by  specialty,  then  use  age/sex  adjustments,  and  finally  (in  some  instances) 
use  procedure  or  diagnosis  codes.  Although  different  logic  is  used,  many  groups  appear 
to  have  similar  capabilities. 

Strategies  to  ensure  comparability  of  patients  vary  tremendously.  In  many  instances,  no 
adjustments  are  made  or  only  adjustments  based  on  age  and  sex.  Some  argue  that  case- 
mix  adjustments  are  unnecessary  since  most  differences  wash  out  over  time,  and  even  if 
they  do  not,  may  not  be  worth  the  expense  of  correcting.  This  attitude  tends  to  prevail 
within  organizations  employing  profiles  merely  to  screen  for  subsequent,  more  focused 
medical  record  reviews.  Others,  particularly  those  using  the  data  for  direct  feedback  to 
providers,  argue  that  the  adjustments  are  essential  to  win  user  acceptance  of  the  reports. 
On  the  ambulatory  side,  case-mix  adjustments  use  various  ICD-9,  CPT-4,  and  NCD  codes 
(Table  3-2)  (Weiner  1991).  On  the  inpatient  side,  adjustments  appear  confined  to  DRGs, 
diagnoses,  or  procedures  or  groups  of  diseases/procedures;  however,  New  York  state  is 
using  procedure-based  data  coupled  with  data  on  hospital  volume  to  compare  hospitals  with 
regard  to  surgical  mortality  rates  (Hannan  et  al.  1991).  Only  one  hospital  chain  is  even 
considering  a  system  that  measures  severity  of  illness. 

Despite  the  ability  to  profile  at  the  individual  physician  level,  reports  are  not  always 
generated.  This  occurs  because  of  small  sample  sizes  and  inability  to  attribute  resource 
use  to  particular  physicians.  In  addition,  some  organizations  have  learned  that  they  get 


51 


a  better  response  on  compliance  to  standard  reports  when  data  are  presented  on  a  plan 
vs  plan  basis  rather  than  physician  vs  plan  basis  or  physician  vs  physician  basis. 


Table  3-2.  An  Overview  of  Five  Ambulatory  Case-Mix  Systems 


Name 

Unit  of 

Number 

Key  Grouping 

Diagnosis 
Ltiusiers 

Primary 
diagnosis/  visit 

100 

ICD-9-CM,  ICPC,  or  ICDA-8 

Ambulatory  Visit 
Groups  (AVGs) 

Visit/encounter 

570 

ICD-9-CM,  CPT-4,  age,  sex, 
new/established  patient 
status 

Ambulatory 
Patient 

Groups  (APGs) 

Visit/encounter 

297 

ICD-9-CM,  CPT-4,  disposition 

Products  of 
Ambulatory  Care 
(PACs) 

Visit/encounter 

23 

ICD-9-CM,  CPT-4,  age,  sex, 
clinic  type,  drug 
administration 

Ambulatory  Care 
Groups  (ACGs) 

Patient  (all  care 
over  1  year) 

54 

ICD-9-CM  or  ICD-9,  age,  sex 

Source:  Reproduced  from:  Weiner  J. P.,  Ambulatory  Case-Mix  Methodologies:  Application  to  Primary  Care 
Research.  Hibbard,  Nutting,  and  Grady,  eds.  Primary  Care  Research:  Theory  and  Methods.  US  DHHS, 
September  1991,  Table  2. 

Current  Data  Sources.  The  variables  used  in  current  profiles  are  generally  based  on 
claims  data  or  billing  files  and  reflect  the  constraints  of  these  databases.1  Thus,  on  the 
resource  side  they  capture  only  those  variables  used  for  billing  and  may  not  provide 
accurate  identification  of  physicians,  information  on  who  ordered  what,  information  of 
source  of  referrals,  etc.  On  the  quality  side,  claims-based  profiles  monitor  mortality, 
readmissions  and  some  morbidities.  To  supplement  the  shortcomings  of  claims  data  in 
assessing  resource  use  and  quality  of  care,  some  profiles  are  also  utilizing  additional 
sources  of  data,  e.g.,  medical  records.  The  recently  begun  ambulatory  DEMPAQ  study, 
built  on  the  experiences  obtained  in  the  Ambulatory  Care  Medical  Audit  Demonstration 
Project  (Palmer  et  al.  1985),  will  provide  important  data  on  the  potential  of  these 
alternative  sources  for  routine  profiling  (Palmer  et  al.  1992).  In  addition  to  claims  data 
and  medical  record  data,  many  organizations  are  developing  profiles  that  also  incorporate 
information  from  patient  outcome  surveys.  The  Medical  Outcome  Study  (MOS)  provides 
an  example  of  the  kinds  of  data  on  physical,  role  and  social  functioning,  mental  health,  and 


Other  sources  of  data  are  also  being  used  in  profiles.  Utilization  review  data  are  used  by  insurers,  hospitals 
and  PROs.  In  some  instances,  external  databases  are  also  tapped.  For  example,  the  Illinois  Hospital  Association 
database  captures  data  on  malpractice  claims  and  state  Drug  Enforcement  Agency  actions.  These  applications 
will  not  be  discussed  further. 


52 


health  perceptions  that  can  be  obtained  in  a  research  study  on  ambulatory  patients  with 
chronic  illness  (Steardt  et  al.  1989).  Although  still  in  the  pilot  phases,  organizations  such 
as  the  Brigham  and  Women's  Hospital  (Boston)  are  experimenting  with  the  collection  of 
detailed  outcome  data  on  inpatients  from  medical  records  and  patient  surveys.  Finally, 
there  are  satisfaction  surveys.  Usually  these  are  of  patients;  GHAA,  for  example,  makes 
available  to  HMOs  a  satisfaction  survey  developed  by  Ware  and  Davies  and  also  provides 
them  a  software  package  for  analysis.  Sometimes,  as  in  one  HMO,  the  "internal 
customers"  of  physicians,  e.g.,  their  colleagues,  referring  physicians,  and  support  staff,  are 
the  source  of  satisfaction  data. 


DATA  NEEDS 
General  Considerations 

In  general,  if  profiles  are  to  be  used  for  serious  attempts  at  quality  improvement,  data 
collection  will  need  to  be  substantially  different  and  more  detailed  than  currently  occurs. 
At  the  same  time  there  should  be  little  opportunity  for  "gaming,"  particularly  for  data 
elements  that  are  to  be  used  for  case-mix  adjustment.  Individual  elements  should  be 
reliable  and  valid  (see  first  paper  in  this  document),  and  for  the  elderly,  the  validity  of  data 
elements  takes  on  additional  importance  because  what  may  be  an  abnormal  finding  in  a 
younger  person  may  occur  so  frequently  in  the  elderly  that  it  cannot  be  considered 
untoward  and  a  system  failure. 

Sample  Size  Issues 

The  numbers  of  patients  defining  a  profile  for  a  physician  need  to  be  large  enough  to 
ensure  statistical  reliability,  and  even  if  data  from  all  payers  are  available,  this  small 
number  problem  remains  a  big  one.  For  example,  Luft  and  his  colleagues  reviewed  the 
numbers  of  procedures  performed  per  provider  per  year  in  two  states  (Luft  1991).  In  New 
York  the  analysis  was  done  for  all  of  the  physicians'  cases  and  in  California  for  the  cases 
covered  by  Blue  Shield  (BS),  the  second  largest  carrier  in  the  state.  In  New  York,  there 
were  over  62,000  cesarean  sections  performed  by  2586  providers;  the  range  of  volumes  per 
provider  was  1-248  with  an  average  of  24.  For  transurethral  resection  of  the  prostate  the 
range  was  1-153  and  the  average  19.  Data  from  California  give  a  sense  of  differences  in 
state-wide  volumes  versus  those  in  a  smaller  area  (San  Francisco)  in  which  feedback 
interventions  are  likely  to  be  more  successful.  For  cholecystectomy,  among  California 
patients  covered  by  BS,  the  volume  per  provider  ranged  from  1-8  and  averaged  1.3;  in  San 
Francisco  these  numbers  were  1-2  and  1.1.  These  examples  illustrate  the  problems  if  all 
data  are  available  (New  York)  or  a  fraction  of  the  data  (California)  are  available.  In  the 
latter  case,  the  experience  of  a  single  insurer  that  covers  only  a  fraction  of  a  provider's 
patients  may  reflect  a  sicker  or  less-ill  average  population.  If,  for  example,  sicker-than- 
average  patients  selectively  enroll  more  often  in  a  fee  for  service  (FFS)  plan  in  one  area 


53 


of  the  country  than  in  another,  physicians  in  the  area  with  sicker  patients  in  the  FFS  plan 
may  be  unfairly  judged  to  provide  a  more  intensive  style  of  care,  even  controlling  for  case 
mix. 

The  small-number  problem  also  exists  at  the  hospital  level.  In  the  study  of  mortality  rates 
using  the  Medicare  Mortality  Predictor  System,  the  majority  of  the  variation  in  annual 
death  rates  for  stroke,  pneumonia,  acute  myocardial  infarction  (AMI),  and  congestive 
heart  failure  occurred  from  chance  variability.  This  was,  in  part,  because  of  the  small 
number  of  patients  seen  yearly  in  each  hospital  (Jencks  et  al.  1988).  For  example,  for  four 
hospitals  in  that  study,  the  range  in  yearly  number  of  patients  seen  with  stroke  was  19- 
50. 

Issues  With  Regard  to  Data  Items  on  Resource  Use 

With  regard  to  resource  use  several  questions  are  relevant: 

•  Are  the  codes  specific  enough?  Is  it  possible  to  differentiate  new  from  old,  or 
expensive  from  less  expensive  (e.g,  tissue  plasminogen  activator  (TPA)  versus 
streptokinase;  abdominal  cholecystectomy  from  laparoscopic  cholecystectomy)? 
ICD-9  codes  do  not  keep  up  with  advances  in  treatment  and  codes  for  new 
procedures  can  take  years  to  develop. 

•  Is  the  definition  of  the  service  clear?  Is  there  any  bundling  of  services?  While 
examples  of  this  are  obvious  (e.g.,  prenatal  care),  others  are  less  so.  For  example, 
in  the  Intensive  Care  Unit  (ICU),  oxygen  therapy  may  or  may  not  be  included  in 
an  ICU  charge  and  the  term  TCU"  itself  may  embrace  a  number  of  different 
kinds  of  sites  with  differing  nurse  to  patient  ratios.  In  ambulatory  care,  if  more 
than  one  service  is  submitted  on  the  same  claim,  they  may  all  be  rolled  up  under 
the  most  expensive  one  in  available  data  sets. 

•  Is  there  any  double  counting  (Grady  1991)?  Medicaid,  for  example,  may  send  a 
bill,  an  adjustment  of  the  bill  and  then  finally  a  denial  of  the  bill,  and  unless 
careful  adjustments  are  made,  three  visits  could  be  coded  instead  of  just  one. 

•  Do  the  data  and  analyses  allow  for  evaluation  of  substitution  effects  (e.g.,  mental 
health  visits  versus  primary  care  visits;  anti-hypertensive  medications  versus 
biofeedback  sessions)?  Information  on  noncovered  services  and  services  above 
a  maximum  allowed  level  will  be  particularly  important  here. 

•  Is  it  possible  to  obtain  data  on  drug  usage?  Information  on  drugs  is  virtually 
never  routinely  available  except  for  Medicaid,  and  yet,  they  are  important 
indicators  of  quality:  e.g.,  thrombolysis  in  eligible  patients  after  an  AMI,  beta- 
blockers  after  an  AMI,  prophylactic  antibiotics  after  certain  surgical  procedures. 


54 


•  Is  it  possible  to  identify  the  responsible  provider?  For  Medicare,  the  Uniform 
Provider  Identification  Number  (UPIN)  will  help  solve  problems  in  this  regard  in 
the  future,  but  even  then  it  will  be  difficult  to  distinguish  between  an  attending 
and  an  assistant  surgeon  and  between  care  provided  by  residents  and  care 
provided  by  the  head  of  the  department  (under  whose  number  residents  are 
recorded).  The  issue  is  more  complex  for  ambulatory  patients  where  there  may 
be  no  one  "physician  of  record".  The  use  of  the  majority  provider  in  the 
DEMPAQ  study  illustrates  an  approach  to  deal  with  this  problem. 

•  Is  the  denominator  to  which  these  services  apply  known,  i.e.,  the  number  of 
dependents  and  months  of  coverage  involved?  For  individual  physicians  (except 
those  in  certain  HMOs)  this  is  a  major  problem. 

Issues  With  Regard  to  Data  Items  for  Quality 

Process  and  outcome  variables  have  been  used  as  measures  of  quality,  and  the  more 
serious  the  consequences  of  a  profile  (e.g.,  licensure,  credentialing),  the  greater  the 
number  of  independent  measures  that  should  be  used. 

Process  variables  are  frequently  resource  variables,  and  thus  the  data  considerations 
mentioned  above  are  equally  relevant  (see  Weiner  et  al.  1990  for  a  list  of  possible  process 
measures).  Process  measures  have  the  advantage  of  being  disease,  procedure,  or  symptom 
specific,  but  their  specificity  may  not  allow  consideration  of  other  important  factors  in  the 
delivery  system  outside  the  provider's  control  (e.g.,  patient  compliance).  In  the  simplest 
case,  however,  processes  can  be  considered  direct  measures  of  quality  since  they  reflect 
standards  of  care  that  have  been  shown  to  have  (or  are  believed  to  have)  an  impact  on 
quality  or  quantity  of  life. 

With  regard  to  outcomes,  general  outcome  measures  (e.g.,  mortality,  readmissions,  some 
morbidities)  have  the  advantage  of  potentially  integrating  all  aspects  of  a  patient's  care 
into  one  measure.  Their  very  sensitivity  means  poor  specificity,  however,  and  thus  they 
cannot  necessarily  be  linked  to  a  specific  patient-provider  encounter  or  to  a  particular 
element  of  care.  Disease-specific  outcomes  (e.g.,  morbidities)  are  obviously  more  specific 
and  may  be  easier  to  link  to  a  particular  encounter  or  process  of  care.  For  both  general 
and  specific  measures  however,  the  links  or  attributions  may  not  be  perfectly  accurate. 
Moreover,  outcomes  such  as  mortality,  morbidity  and  readmissions  must  be  used  with 
caution  as  measures  of  quality  for  a  particular  provider  because  of  the  multiple  factors  that 
can  contribute  to  these  outcomes  and  that  are  beyond  the  control  of  the  provider.  In 
essence,  these  are  proxies  or  indirect  measures  of  quality. 


55 


For  both  hospitalized  and  ambulatory  patients,  completeness  of  data  is  a  problem.  Studies 
involving  patients  at  the  Brigham  and  Women's  Hospital  and  five  other  teaching  hospitals 
illustrate  this  (Geary  et  al.  1991).  For  case-mix  adjustments  there  were  several  data 
problems  even  in  the  complete  medical  record.  For  example,  of  patients  having  a  coronary 
artery  bypass  graft  (CABG),  20-72  percent  had  no  information  about  key  physiologic 
measures  (e.g.,  end  diastolic  pressure;  presence  of  three-vessel  disease),  and  information 
on  cardiac  catheterization  results  was  missing  from  the  hospital  charts  about  one-third  of 
the  time,  largely  in  referred  patients.  Baseline  functional  status  explains  a  large  fraction 
(1/3)  of  resource  use  and/or  quality  of  care;  yet,  information  on  this  item  was  seldom 
included  in  a  systematic  way  in  the  medical  record.  On  the  outcome  side,  the 
complications  of  interest  (e.g.,  return  to  the  operating  room  after  surgery,  hip  dislocation 
after  hip  replacement)  were  generally  not  in  the  discharge  abstract  and  had  to  be  obtained 
by  manual  chart  review.  Other  measures  of  outcome  (e.g.,  function,  satisfaction)  needed 
to  be  obtained  directly  from  patients  after  discharge. 

Case-Mix  Adjustments 

The  extent  of  data  needed  for  case-mix  adjustments  will  depend  upon  the  setting.  For 
ambulatory  patients,  physicians  usually  provide  care  for  several  diseases  over  several 
months  or  years.  Thus,  case-mix  adjustments  should  be  based  on  patient  experiences  over 
a  long  enough  period  of  time  so  as  to  observe  the  full  range  of  diagnostic  evaluations, 
treatments  and  outcomes  likely  to  occur.  Ambulatory  Care  Groups  (ACGs)  provide  an 
example  of  this  approach  (Weiner  1991).  Alternatively,  if  studies  on  a  particular  condition 
or  preventive  service  are  desired,  a  specific  time  interval  should  be  selected  empirically  to 
measure  an  episode  ~  "a  series  of  temporally  contiguous  health  care  services  related  to 
treatment  for  a  specific  condition  (Garnick  et  al.  1990).'*  At  the  level  of  a  particular 
condition,  disease-specific  case-mix  adjustment  systems  are  likely  to  be  necessary.  In  all 
cases,  systems  developed  for  one  group  of  patients  (e.g.,  Medicaid)  need  to  be  validated 
on  others  (e.g.,  Medicare)  before  they  are  used  in  other  populations.  In  contrast  to 
ambulatory  patients,  hospitalized  patients  are  generally  admitted  for  a  particular  reason, 
and  profiles  should  be  based  on  case-mix  adjustments  relevant  to  the  reason  for  admission. 
Thus,  the  data  system  must  be  able  to  identify  conditions  present  at  the  beginning  of  an 
illness  and  to  differentiate  them  from  late  sequelae  of  the  disease  or  from  results  of 
treatment. 

Time  Periods  for  Reports 

The  time  period  over  which  the  data  are  collected  and  profiles  made  must  be  clinically 
meaningful  yet  statistically  sound.  A  tension  exists  between  these.  An  example,  generally 
9-12  months  of  data  will  be  required  to  achieve  anything  approaching  a  reasonable  sample 
size  (see  above);  however,  a  period  of  9-12  months  may  seem  exceedingly  long  to  a 
physician  trying  to  remember  how  he  or  she  cared  for  patients  that  long  ago.  Times 
shorter  than  9-12  months  will  lead  to  noisy  data,  and  if  used,  analyses  should  be  repeated 


56 


periodically.  Times  longer  than  9-12  months  may  be  associated  with  the  introduction  of 
new  medical  practices  and  thus  may  appear  irrelevant.  Finally,  the  lag  time  between  the 
end  of  the  profile  period  and  the  presentation  of  data  from  that  period  should  be  minimal; 
however,  in  this  context,  "end"  needs  to  be  defined  -  last  patient  visit,  last  test  performed 
for  patient  who  visited  in  that  period,  last  claim  received,  etc. 

ANALYTIC  FRAMEWORK 
General  Considerations 

Physician  profiling  involves  the  use  of  hierarchical  levels  of  observational  data:  patients 
can  be  grouped  by  physician,  physicians  by  provider  organizational  unit  (e.g.  practice  site), 
units  by  larger  organizations  and  so  on.  The  analysis  of  these  data  should  also  make  use 
of  this  hierarchical  structure.  In  this  section  we  outline  such  an  analytic  approach  in  which 
building  blocks  are  typically  regression  equations  describing  the  relation  of  an  outcome  to 
characteristics  of  the  observational  units.  At  the  most  fundamental  level  in  which  patients 
are  grouped  by  physician,  equations  would  relate  patient  characteristics  to  outcomes  for 
a  given  physician.  At  all  levels,  however,  even  with  the  best  models  available,  the 
observational  nature  of  the  data  makes  it  difficult  to  arrive  at  causal  inferences.  For 
example,  in  the  simplest  case,  if  a  regression  model  were  available  with  perfect  explanatory 
power,  its  coefficients  would  identify  without  error  those  factors  that  contribute  to  a  better 
or  worse  outcome.  However,  we  would  not  know  what  it  was  about  those  factors  that 
contributed  to  the  outcome.  Thus,  specific  remedial  actions  would  not  be  apparent. 
Moreover,  this  model  would  not  preclude  the  existence  of  another  model  with  different 
independent  variables  that  was  equally  predictive.  For  example,  if  a  perfect  model  said 
that  the  age  of  patients  contributed  to  higher  mortality  rates  for  patients  treated  by  a 
particular  physician,  we  would  not  know  what  it  was  about  age  that  did  this.  It  could  have 
been  that  his/her  elderly  patients  were  not  offered  best  treatment,  that  they  were  offered 
and  declined,  or  that  some  unmeasured  other  characteristics  of  the  patient  were 
contributing  to  the  event. 

In  practice  models  have  considerably  less-than-perfect  predictive  power.  The  resulting 
"noise"  makes  it  even  more  difficult  to  arrive  at  conclusions  that  lead  to  specific 
recommendations  for  improvement.  Thus,  using  profiles  for  quality  improvement  is  a  first 
step  in  the  effort  but  a  limited  one  in  the  absence  of  process  data.  However,  it  is  difficult 
to  incorporate  process  as  an  independent  variable  in  any  model  because  of  selection  biases. 
For  example,  a  given  physician  might  not  offer  a  particular  process  element  (e.g., 
thrombolysis)  to  all  patients,  and  therefore,  there  is  confounding  in  relating  that  process 
measure  to  outcome.  This  difficulty  could  be  overcome  if  we  were  able  to  select  cohorts 
of  patients  with  similar  types  of  care  (as  in  a  clinical  trial). 


57 


In  summary,  then,  efforts  to  profile  physicians  should  be  done  in  a  way  that  is 
commensurate  with  the  complexity  of  the  data  and  that  can  lead  to  improvements  in 
quality  of  care  or  in  the  cost-effectiveness  of  care.  Incorporation  of  process  elements  will 
help  the  latter  goal.  Improved  statistical  techniques  will  help  the  former.  These  are 
described  below  but  are  not  yet  available  for  routine  use  in  their  most  general  formulation. 

Hierarchical  Modeling 

A  detailed  discussion  follows,  not  for  the  purpose  of  prescribing  exact  analytic  approaches, 
but  rather  for  the  purpose  of  indicating  a  structural  framework  that  should  be  used  for 
complex  problems  involving  multiple  sources  of  variability.  In  the  absence  of  this  line  of 
thinking  and  analysis,  spurious  results  or  interpretations  can  exist  for  at  least  three 
reasons: 

•  providers  see  different  numbers  of  patients, 

•  providers  see  different  types  of  patients,  and 

•  averages  across  groups  of  providers  can  mask  considerable  heterogeneity  within 
each  of  the  groups. 

An  ideal  analytic  approach  would  control  for  various  sources  of  variability  that  might 
contribute  to  a  given  outcome,  and  the  structure  of  the  analysis  would  reflect  the  structure 
of  the  underlying  data,  starting  from  the  most  basic  unit  (the  patient),  continuing  with  the 
doctor  and  then  moving  to  the  most  complex  (site  of  care)  (Garnick  et  al.  1990;  Hirshfeld 
1991).  Patient  characteristics  should  include  information  on  disease  severity,  comorbidity, 
emergent  status  (particularly  for  inpatients),  behavioral  characteristics  (e.g.,  doctor-seeking 
behavior),  socioeconomic  status,  and  sociodemographics.  Relevant  physician 
characteristics  depend  upon  whether  previous  stratification  (i.e.,  primary  care  providers 
instead  of  specialist)  was  already  done,  but  within  any  stratum  would  include  information 
like  age,  sex,  and  practice  intensity  (i.e.,  fraction  of  practice  devoted  to  special  disease  or 
procedure),  managed  care  membership,  etc.  Site  variables  would  depend  upon  whether 
the  inpatient  or  the  outpatient  setting  was  being  studied.  On  the  inpatient  side,  relevant 
variables  include  teaching  versus  non-teaching,  public  versus  private,  urban  versus  rural, 
etc;  on  the  outpatient  side,  hospital  ambulatory  setting  versus  private  office  setting.  For 
both  in-  and  outpatient  comparisons  a  measure  of  technology  intensity  would  likely  be 
useful. 

In  order  to  elucidate  the  fundamental  concept  of  hierarchical  modeling,  consider  the 
following  simplified  example  in  which  several  physicians  treat  patients  with  an  acute 
myocardial  infarction  at  a  single  institution.  For  simplicity,  assume  that  they  all  see  the 
same  type  of  patients  and,  hence,  that  there  is  no  need  for  case-mix  adjustments.  Even 
if  all  physicians  had  the  same  theoretical  mortality  rate  among  their  patients,  the  observed 


58 


mortality  rates  in  any  given  year  would  most  likely  be  different.  In  other  words,  the 
variation  in  the  observed  mortality  rates  is  generally  larger  than  the  variation  in  the  true 
(theoretical)  mortality  rates.  A  hierarchical  model  for  this  simplified  situation  has  a  two- 
level  structure.  Level  I  describes  variation  in  outcomes  among  patients  of  a  single 
physician:  the  number  of  observed  deaths  among  patients  of  particular  physician  follows 
a  probability  distribution  governed  by  the  physician's  theoretical  mortality  rate.  Level  II 
describes  variation  of  the  theoretical  mortality  rates  across  physicians:  these  rates  follow 
a  probability  distribution  over  the  population  of  physicians. 

We  can  use  this  simple  hierarchical  model  in  two  ways:  (1)  to  generate  estimates  of  the 
population  distribution  of  theoretical  mortality  rates,  and  (2)  to  obtain  estimates  of  each 
physician's  theoretical  rate.  The  latter  are  derived  by  pooling  the  data  across  physicians 
and  by  "shrinking"  the  observed  rates  to  a  central  value,  in  a  way  that  takes  into 
consideration  differences  in  sample  sizes  across  physicians. 

For  instance,  suppose  physician  A  treats  25  patients  in  one  year  with  a  mortality  rate  of 
15  percent,  and  physician  B  treats  200  patients  with  a  mortality  rate  of  20  percent.  The 
final  estimates  derived  from  the  hierarchical  model  may  show  the  order  between  the  two 
physicians  to  be  reversed,  especially  if  the  population  mortality  rate  were  much  larger  than 
20  percent.  The  rate  estimates  derived  from  the  hierarchical  model  are  in  general 
smoother  and  more  stable  (and,  ultimately,  more  accurate)  than  the  estimates  obtained 
from  using  each  physician's  data  separately. 

The  above  example  can  be  made  considerably  more  realistic  by  incorporating  patient  case- 
mix  and  physician  characteristics.  Consider  a  typical  three-level  model  for  profiling 
physicians  practicing  in  one  of  several  sites.  Level  I  is,  as  before,  a  collection  of  models 
with  patient-level  characteristics  as  input.  These  models  share  the  same  mathematical 
form  and  describe  how  a  patient-specific  measure  (e.g.,  resource  use  or  quality  of  care) 
depends  on  the  characteristics  of  the  patients  for  a  specific  physician.  Thus,  for  each 
physician,  Level  I  specifies  a  model  applicable  to  the  patient  population  seen  by  that 
specific  physician.  Although  any  number  of  models  can  be  used  for  this  purpose,  for 
illustrative  purposes  a  linear  regression  model  is  simplest.  For  example,  if  the  level  of 
glycosylated  hemoglobin  is  considered  as  the  measure  of  quality  of  care  for  diabetics,  this 
measure  would  be  modeled  as  a  linear  function  of  several  patient-level  factors  (e.g.,  age, 
sex,  type  of  diabetes)  plus  random  error.  The  coefficients  of  the  linear  regression 
(including  the  intercept)  could  be  specific  to  the  particular  physician.  Hence,  Level  I 
specifies  a  separate  linear  model  for  each  physician: 

Measure  of  Quality  =  intercept 

+  linear  combination  of  patient  characteristics 
+  error. 


59 


Level  II  of  this  model  is  a  between-physicians  one,  which  describes  the  variation  between 
the  coefficients  of  the  physician-specific  models,  within  a  given  site,  as  a  function  of 
physician  characteristics  plus  error.  It  is  based  on  the  fact  that  a  physician's  style  of 
practice  for  patients  of  a  given  type  varies  with  his  own  characteristics,  in  essence,  a  case- 
mix  adjustment  factor  for  physicians.  The  approach  treats  outcomes  at  the  physician  level 
in  a  similar  mathematical  way  as  was  done  at  the  patient  level  so  that  there  is  a  physician- 
specific  coefficient  that  is  a  function  of  several  factors.  For  example,  a  linear  regression 
model  for  Level  II  would  have  the  form: 

Coefficient  =  intercept 

+  linear  combination  of  physician  characteristics 
+  error. 

Level  III  of  this  analysis  would  be  a  between-sites  model  and  would  be  relevant  when 
physicians  who  practice  in  different  sites  are  being  compared.  Thus,  for  example,  a 
physician  treating  diabetics  in  an  ambulatory  hospital  setting  may  have  quicker  access  to 
chemistry  tests  than  does  a  physician  practicing  in  a  doctor's  office.  The  model  for  Level 
III  describes  the  variability  of  the  site-specific  coefficients  of  level  II  as  a  function  of  site 
characteristics  plus  random  error.  Specifically,  there  is  a  site-specific  coefficient  defined  by: 

Coefficient  =  intercept 

+  linear  combination  of  site  characteristics 
+  error. 

This  approach  will  lead  to  an  estimated  adjusted  outcome  per  physician  that  can  be  used 
in  at  least  two  ways.  First,  if  outcome  measures  can  be  grouped  into  acceptable  and 
unacceptable  ones,  then  physicians  can  be  grouped  accordingly.  Alternatively,  if  the  range 
of  acceptability  is  unclear,  then  physicians  can  be  ranked,  and  screens  placed  to  identify 
extreme  tails  of  the  distribution.  In  addition,  at  a  more  fundamental  level,  the  magnitude 
of  the  various  coefficients  will  indicate  the  importance  of  the  corresponding  variables  in 
achieving  that  outcome.  They  can  be  used  as  a  guide  to  identifying  areas  for  further  study 
of  process  that  might  ultimately  be  the  cause  of  differences  in  quality. 

If  this  analysis  is  completed  for  multiple  outcomes  (e.g.,  length  of  stay,  patient  satisfaction, 
mortality,  serious  complications,  etc),  physicians  could  be  rated  on  each  of  these.  The 
interpretation  of  the  data  could  then  be  done  either  informally  or  formally.  Work  on  a 
formal  approach  could  be  done  by  either  placing  value-driven  weights  on  each  of  the 
outcomes  or  by  deriving  data-driven  weights  to  be  used  in  the  formulation  of  a  composite 
index.  The  latter  might  be  obtained  by  use  of  principal  component  or  factor  analysis. 


60 


Summary 

We  should  assume  that,  with  the  best  data  analysis  possible  and  with  current  data 
availability,  statistical  models  will  explain  only  a  part  of  the  observed  variation.  Moreover, 
these  models  do  not  lead  to  direct  conclusions  on  how  to  improve  quality;  they  merely 
point  out  potential  differences  in  quality  and  may  flag  areas  to  review  to  improve 
processes  of  care. 

The  above  analytic  framework  can  be  used  in  efforts  to  improve  quality  even  though  the 
full  scale  hierarchical  approach  is  not  used.  First,  the  very  production  of  profiles  using 
unadjusted  rates  will  provide  the  opportunity  to  convene  physicians  and  other  providers 
and  to  discuss  possible  reasons  why  the  profiles  might  not  be  accurate.  Physicians  will  be 
"buying"  into  the  process  of  profiling  at  this  point.  Because  of  the  likely  inaccuracy  of  the 
profiles,  however,  they  should  be  used  for  nothing  more  than  discussion  purposes.  These 
discussions,  a  review  of  the  literature,  and  consultations  with  consultants  will  provide 
estimates  of  the  patient-level  characteristics  that  should  be  used  to  refine  the  profiles. 

Second,  a  simple  case-mix  adjustment  analysis  will  improve  the  accuracy  of  individual 
physician  profiles  considerably.2  They  will  not  tell  what  physician  characteristics  (e.g.,  age, 
level  of  training)  are  associated  with  differences;  for  educational  programs,  this  may  be  a 
particularly  important  deficiency.  Nor  will  these  improved  profiles  indicate  what  it  was 
about  the  process  of  care  that  led  to  differences  in  resource  use  or  outcomes  across 
physicians.  Because  patient  differences  will  have  been  taken  into  account,  however,  the 
credibility  of  the  resulting  profiles  will  increase  considerably,  and  discussions  to  identify 
process  areas  will  be  more  fruitful. 

CATEGORIZING  PHYSICIANS 
General  Considerations 

The  size  of  the  unit  of  analysis  used  in  profiling  can  vary  from  a  single  physician's  handling 
of  a  single  clinical  presentation  to  a  measure  of  virtually  all  physician  activity  in  the  U.S. 
(as  in  Medicare's  Volume  Performance  Standards  (VPS)).  In  between  are  groups,  like  all 
physicians  in:  a  specific  practice,  a  hospital  medical  staff,  a  given  specialty  (or  set  of 
specialties)  in  a  specified  geographic  area  or  in  a  geographic  area  overall. 

Larger  units  of  analysis  are  associated  with  smaller  statistical  fluctuations  arising  from  the 
small  number  problem.     This  advantage  is  offset  by  two  disadvantages,  however: 


In  some  instances  case-mix  adjustments  will  be  less  important  than  in  others.  General  obstetrical  services 
are  a  good  example  of  this. 


61 


heterogeneity  that  cannot  be  adequately  modeled  and  accounted  for;  and  the  inability  to 
assign  responsibility  to  an  individual  physician  or  small  group,  as  would  be  needed  for 
feedback  educational  activities. 

Volume  Performance  Standards 

Five  considerations  are  important  in  considering  appropriate  level  of  aggregation  for  the 
VPS.  First,  the  level  of  aggregation  should  be  large  enough  to  ensure  reasonable 
constancy  of  case-mix  (or  the  ability  to  adjust  for  differences  in  it).  Second,  it  should  be 
small  enough  to  make  feedback  possible  and  effective.  Third,  there  should  be  incentives 
such  that  any  changes  in  the  individual  behavior  of  a  physician  should  have  a  reasonable 
chance  of  being  associated  with  changes  in  his/her  payments.  Fourth,  the  level  of 
aggregation  should  be  small  enough  to  be  useful  in  quality  studies  beyond  those  involving 
gross  measures  of  over  or  under  utilization.  And,  finally,  proper  attribution  is  necessary; 
this  will  be  particularly  important  for  referral  service  providers  (e.g.,  radiologists). 

Thus,  in  considering  responses  to  something  like  VPS,  an  aggregation  level  of  all 
physicians  over  all  resource  use  is  too  large  a  grouping  to  meet  any  of  the  criteria. 
Aggregating  all  cardiologists  in  the  United  States  is  better,  but  it  is  neither  ideal  for 
feedback  or  for  quality  measurement.  Aggregating  all  cardiologists  in  Massachusetts  is 
better  for  feedback,  incentives,  and  quality  measurements;  it  falls  down,  however,  because 
case-mix  is  likely  to  vary  and  must  be  controlled  for. 

In  terms  of  attribution  for  Medicare  beneficiaries,  the  UPIN  approach  will  solve  many 
problems  for  primary  providers,  leaving  ambiguity  primarily  for  referral  service  providers 
(e.g.,  radiologists,  anesthesiologists,  or  pathologists).  For  these  specialties  there  are 
measures  of  utilization  that  they  do  control,  and  these  should  be  identified.  For  example, 
in  radiology,  additional  views  beyond  standard  ones;  for  anesthesiologists,  additional  lines 
beyond  standard  ones;  and  for  pathologists,  additional  stains  or  sections  beyond  standard 
ones  are  all  illustrative.  On  the  other  hand,  when  radiology  and  pathology  services 
(particularly  laboratory  tests)  are  provided  (self-referred)  by  primary  care  doctors,  these 
services  should  not  be  attributed  to  the  specialty.  Rather  they  should  be  attributed  to  the 
primary  provider. 

Quality 

For  measuring  quality  of  care,  a  lower  level  of  aggregation  will  likely  be  appropriate.  A 
fundamental  principle  should  be  that  if  the  same  quality  of  care  is  expected  regardless  of 
the  type  of  provider  (e.g.,  general  internist,  family  practitioner,  general  practitioner),  then 
these  providers  should  be  grouped  if  plans  or  groups  are  to  be  compared.  Thus,  on  a 
procedure  basis,  if  both  orthopedists  and  neurosurgeons  do  laminectomies,  for  example, 
the  profile  should  aggregate  their  results.  If  differences  among  types  of  providers  exist, 
they  will  be  identified  in  the  hierarchical  modeling  discussed  above.   Also,  if  family 


62 


practitioners  are  considered  to  provide  similar  quality  of  care  and  resource  use  as  do 
internists,  then  it  would  be  reasonable  to  group  them  and  exclude  from  the  grouping 
services  provided  by  one  but  not  the  other  (e.g.,  obstetrics). 

At  any  level  of  aggregation,  specialty  classification  of  physicians  is  necessary.  This  can  be 
done  on  the  basis  of  self-reports,  Board  eligibility,  Board  certification,  percentage  of 
patients  admitted  to  hospital  with  a  specialty  diagnosis,  percentage  of  all  patients  treated 
(in  or  out  of  hospital)  with  a  specialty  diagnosis,  etc.  In  the  latter  two  cases,  a  cut-off 
percentage  must  be  chosen  to  differentiate  specialty  from  non-specialty  classifications. 
Whatever  the  classification  used,  decisions  need  to  be  made  about  specialty  providers  who 
see  patients  with  both  general  and  specialty  problems  (e.g,  a  cardiologist).  If,  for  example, 
electrocardiograms  per  visit  are  being  used  as  a  resource  or  quality  measure,  which  of  the 
cardiologist's  patients  should  be  included  in  the  profile?  If  all,  his  resource  will  likely  be 
lower  than  that  of  a  full  time  cardiologist.  If  some,  what  routine  system  can  be  used  to 
identify  what  patient  is  in  and  what  patient  is  out? 

STANDARDS  OR  NORMS  FOR  COMPARISONS 
Current  Activities 

Local  and  national  averages  have  been  used  as  standards  in  profiling.  Local  ones  have  the 
advantage  of  reflecting  the  beliefs  and  experiences  of  a  community  of  physicians  who  have 
the  opportunity  to  talk  to  each  other  and  who  can  be  jointly  educated  by  others.  National 
averages  have  the  advantage  of  being  built  on  larger  experiences  that  are  likely  to  be  less 
biased  than  are  local  experiences.  It  is  likely  that  national  experiences  should  serve 
primarily  as  a  starting  point  for  discussion.  Upon  discussion  they  should  be  modified  to 
take  into  account  local  sensitivities. 

Normative  or  prescriptive  standards  have  been  proposed  as  an  alternative  to  national  or 
local  averages.  In  some  cases,  particularly  for  use  of  preventive  or  screening  services,  the 
normative  standard  is  clear  ~  use  of  X  units  per  patient  per  time  period.  However, 
problems  with  the  nature  of  the  denominator  may  arise  depending  upon  the  purpose  of 
the  profile.  For  example,  how  should  an  individual  physician  be  measured  in  terms  of 
compliance  with  mammography  standards?  Should  ratios  be  calculated  using  all  patients 
in  his/her  patient  panel  or  only  those  who  make  appointments? 

Criteria  necessary  for  the  selection  of  appropriate  normative  standards  have  been 
discussed  in  relationship  to  malpractice  litigation  (Garnick  et  al.  1991;  Hirshfeld  1991). 
Those  criteria  that  also  appear  applicable  to  profiling  include:  they  must  be  widely 
accepted,  particularly  to  the  group  being  profiled;  they  must  be  clear  and  unambiguous  to 
the  provider;  and  there  must  be  sufficient  clinical  detail  in  the  data  set  used  for  profiling 


63 


to  show  adherence  or  lack  thereof  to  a  normative  standard.  They  should  also  obviously 
be  based  on  sound  publicly  available  data. 

These  criteria  will  be  met  with  varying  degrees  of  success  in  the  context  of  profiling.  The 
criterion  of  wide  acceptance  will  generally  be  difficult  to  achieve  unless:  the  data 
supporting  the  decision  are  compelling,  multiple  organizations  or  institutions  have 
approved  them,  and  the  individual  being  profiled  relates  to  one  of  these  institutions.3 
Wide  acceptance  has  already  been  achieved  for  a  number  of  screening  or  preventive 
services  and  in  a  few  others  as  well  (e.g.,  anesthesiology  standards,  pacemaker  standards). 
In  other  situations,  those  being  monitored  for  either  quality  or  resource  reasons  will  need 
to  play  an  active  role  in  the  development  of  normative  standards.  The  clarity  of  guidelines 
will  depend  upon  the  topic  studied  and,  except  for  preventive  measures,  not  enough 
profiles  have  been  developed  to  indicate  how  often  they  are  clear  and  unambiguous.  It 
is  likely,  however,  that  for  most  clinical  situations,  clear  guidelines  will  require  a  level  of 
detail  greater  than  most  providers  are  willing  to  follow  or  those  profiling  are  willing  to 
monitor. 

Finally,  with  regard  to  data  and  documentation  for  adherence  to  profiles,  there  are  major 
problems.  On  the  ambulatory  side,  multiple  providers  and  hence  multiple  records  will 
make  it  difficult  to  document  the  absence  of  an  indication.  For  hospitalized  patients, 
although  the  integrated  medical  record  should  be  helpful,  it  may  not  provide  enough  detail. 
Although  the  Uniform  Clinical  Data  Set  (UCDS)  will  increase  the  amount  of  clinical 
information  available  for  five  percent  of  the  Medicare  population,  its  value  needs  to  be 
determined.  Application  of  guidelines  for  pacemakers  illustrates  the  need  for  detailed  data 
(American  Hospital  Association  Task  Force  Report  1991).  These  guidelines  divide 
indications  for  permanent  pacemakers  into  three  classes  (I  ~  general  agreement  that  they 
should  be  implanted;  II  ~  divergence  of  opinion  as  to  use;  and  III  ~  no  evidence  that  they 
are  necessary).  A  review  of  the  information  needed  for  class  III  situations  is  illustrative. 
For  patients  having  an  AMI,  an  inappropriate  Class  III  indication  for  a  pacemaker  is 
"transient  AV  conduction  disturbances  in  the  absence  of  intraventricular  defects";  the 
information  needed  to  identify  these  symptoms  will  generally  not  be  readily  available 
except  through  a  detailed  review  of  the  full  medical  record. 

In  general,  then,  it  would  appear  that  normative  standards  can  be  used  only  for  the 
simplest  situations,  generally  binary  ones,  and  these  are  likely  to  occur  most  often  in  the 
prevention  arena  (e.g.,  mammograms,  immunization). 


3  These  comments  do  not  consider  the  potential  role  of  the  introduction  of  the  Resource-Based  Relative 
Value  Scale.  It  is  possible  that  if  physicians  feel  hassled  by  new  regulations,  they  will  be  disinterested  in  seriously 
evaluating  the  value  and  importance  of  guidelines  in  the  context  of  profiling.  Physicians  providing  primarily 
cognitive  services  may  be  less  likely  to  hold  this  position. 


64 


Whatever  the  standard  used,  given  the  large  unexplained  variance  that  is  likely  to  exist 
with  most  profiling  activities,  these  profiles  should  be  used  for  feedback  with  particular 
attention  paid  to  physician  confidentiality.  This  needs  to  be  done  for  two  reasons.  First, 
in  the  absence  of  aberrant  behavior  on  multiple  or  egregious  measures  over  time,  a  clear 
case  of  "bad  medicine"  will  likely  not  be  air  tight.  At  the  physician  level,  the  recent 
example  of  an  obstetrician  who  artificially  inseminated  his  patients  would,  if  proved,  likely 
be  the  closest  example  to  egregious  behavior  over  time  that  we  have.  Second,  while  the 
consequences  for  an  individual  physician  are  greater  than  are  those  for  an  institution,  the 
resources  that  an  individual  physician  would  have  to  dispute  such  a  claim  are  likely  to  be 
modest.  They  will  be  considerably  more  modest,  for  example,  than  a  hospital  would  have 
in  disputing  HCFA  predicted  mortality  rates  for  its  patients. 


FEASIBLE  OPTIONS 

Although  there  are  currently  many  unresolved  difficulties  with  regard  to  the  collection, 
analysis  and  interpretation  of  data  for  profiling,  profiles  can  serve  many  useful  purposes, 
particularly  with  regard  to  education  and  quality  improvement  activities.  These  profiles 
will  be  of  primary  value  to  the  organization  in  which  they  are  developed  and  used;  in 
essence,  individuals  being  profiled  have  to  "buy  into"  the  process.  Profiles  are  likely  to 
serve  primarily  as  a  benchmark  for  discussion  purposes  rather  than  as  an  absolute 
measure.  Sidney  Wolfe's  belief  in  the  adequacy  and  quality  of  adjustment  systems 
available  at  this  time  is  premature  (Vibbert  1991)  and  thus,  in  most  cases,  adjustments  will 
not  provide  an  unassailable  case  against  a  particular  provider. 

The  following  options  assume  that  the  level  of  data  analysis  that  exist  currently  will  be  the 
dominant  solution  for  several  years  and  that  more  complicated  and  rigorous  approaches 
will  take  several  years  to  implement  and  refine. 

Ambulatory  Care  Setting 

Many  of  the  considerations  mentioned  above  hold  here  as  well  and  the  results  will  likely 
serve  as  a  flag  for  aberrant  behavior.  The  case-mix  adjustment  approaches  are  likely  to 
be  more  general  (e.g.,  ACGs)  when  an  entire  practice  is  being  profiled  than  when 
experience  with  a  particular  disease  is  being  profiled. 

•  Comparisons  across  sites  using  claims  data  and  the  ACG  adjustment  appear  to  be 
feasible.  Preliminary  results  from  the  Maryland  Medicaid  project  from  primary 
providers  suggest  that  for  some  diseases  very  large  differences  in  selected  quality 
indicators  (e.g.,  emergency  department  visits,  "sentinel"  hospital  admissions)  occur 
across  sites  of  care  (i.e.,  hospital,  community  group,  private  M.D.)  (Weiner  1992). 
These  results  point  the  way  towards  a  more  detailed  view  at  several  levels: 


65 


patient  (in  terms  of  disease  severity,  comorbidity,  and  social  factors);  physician; 
and  site  of  care  (types  of  facilities,  adequacy  of  nursing  staff,  etc). 

•  Some  between-physician  comparisons  may  also  be  possible  within  sites,  but  small 
sample  sizes  and  potentially  inadequate  case-mix  adjustments  may  make 
interpretation  difficult.  For  example,  if  diabetes  is  the  disease  for  which  outcomes 
are  being  measured  for  physicians,  the  case-mix  adjustment  system  will  need  to 
adjust  for  severity  of  disease  and  other  comorbid  conditions,  and  information  on 
these  may  not  be  fully  available. 

Hospital  Setting 

•  Within  a  department  or  division,  comparisons  of  individual  physicians  with  each 
other  can  be  made  for  resource  use  (e.g.,  length  of  stay)  and  for  global  satisfaction 
with  care  provided  some  level  of  case-mix  adjustment  is  made  beyond  that  of  the 
ICD-9  or  DRG  level  and  that  a  statistical  adjustment  is  performed  to  take  out 
account  the  fact  that  some  physicians  see  different  numbers  of  patients  than  do 
others.  The  standard  can  be  the  departmental  average  or  the  average  from 
"comparable"  groups  (e.g.,  other  comparable  hospitals). 

•  Within  a  department  or  a  division,  individual  physicians  can  be  compared  with 
each  other  with  respect  to  quality  of  care  if  appropriate  statistical  adjustments 
(especially  for  case-mix)  are  made.  Case-mix  adjustment  systems  designed  for 
quality  of  care  are  unlikely  to  be  the  same  as  for  the  resource  models.  The 
dependent  variable  in  these  models  should  ideally  be  a  disease-specific  one  that 
can  be  causally  related  to  a  physician  encounter  or  action.  For  example,  return 
to  the  operating  room  would  be  a  satisfactory  measure  for  patients  having 
coronary  artery  bypass  grafts  and  hip  dislocations  would  be  a  satisfactory  measure 
for  physicians  doing  total  hip  replacements. 

•  Comparisons  of  physician-specific  mortality  rates  are  problematic  because  deaths 
are  rare  and  because  an  event  like  death  reflects  not  only  the  contributions  of  an 
individual  provider  but  also  those  of  the  entire  setting  in  which  care  takes  place. 
In  general,  the  sample  sizes  available  for  a  particular  provider  in  a  realistic  time 
frame  will  not  allow  adequate  statistical  accuracy  to  reach  compelling  conclusions 
about  physician  performance  (Luft  and  Hunt  1986). 

GENERAL  RECOMMENDATIONS 

Profiling  is  in  its  infancy,  and  several  recommendations  are  pertinent  to  its  successful 
application. 


66 


•  It  will.be  essential  for  researchers  and  institutions  developing  profiles  to  work  with 
developers  of  new  medical  information  systems  that  are  being  proposed  and 
developed,  e.g.,  the  Hartford  Foundation-supported  Community  Health 
Management  Information  System.  This  kind  of  collaboration  will  ensure  that  data 
collection  is  relevant,  parsimonious  and  accurate. 

•  The  applicability  of  current  case-mix  systems  to  populations  other  than  those  for 
which  they  were  developed  needs  to  be  ascertained.  This  will  be  particularly 
important  if  these  systems  are  to  be  used  on  Medicare  patients.  Validation  should 
be  done  in  several  settings. 

•  Research  into  improved  methods  of  education  and  feedback  needs  to  be 
encouraged.  The  work  by  Lomas  and  his  colleagues  on  cesarean  sections  (Lomas 
et  al.  1989)  and  by  Soumerai  and  Avorn  (Soumerai  et  al.  1991;  Avorn  and 
Soumerai  1983;  Avorn  et  al.  1989;  Soumerai  et  al.  1987)  and  their  colleagues  on 
drugs  can  serve  as  models. 

•  To  the  extent  that  profiling  will  be  used  for  sanctions  (e.g.,  admission  privileges, 
credentialing),  there  must  be  considerable  redundancy  and  robustness  in  the  data 
used  for  the  decision.  Research  needs  to  be  done  on  what  kinds  of  information 
should  be  used  and  how  temporal  analyses  should  be  done. 

•  The  development  of  improved  methods  for  analyzing  data  on  profiles  needs  to 
done.  For  example,  physician-  and  site-specific  factors  affecting  resource  use  or 
quality  need  to  be  identified.  These  need  to  be  linked  to  process  items  of  care 
that  can  be  changed.  Tests  for  robustness  of  hierarchical  models  need  to  be 
developed.  Statistical  tests  to  evaluate  these  data  over  time  need  to  be  developed. 
Sample  size  estimates  for  each  Level  of  the  model  are  needed. 

ACKNOWLEDGEMENTS 

We  are  indebted  to  Drs.  Paul  Cleary  and  Carl  Morris  for  helpful  discussions  regarding  the 
analytic  framework.  In  addition,  many  individuals  kindly  shared  their  personal  experiences 
with  us:  John  Alsip,  Chris  Bailey,  Robert  Berenson,  M.D.,  Andrew  Bindman,  M.D.,  James 
Cannon,  Lynn  Chapel,  David  Chinsky,  Jarrett  Clinton,  M.D.,  Kathy  Coltin,  Conrad  Deeter, 
Robert  Edmiston,  M.D.,  Jinnet  Fowles,  Ph.D.,  David  Gans,  Deborah  Garnick,  Sc.D.,  Tom 
Gotowka,  M.D.,  Sharon  Hiner,  M.D.,  Chris  Izui,  Julia  Janosi,  Ian  Leverton,  M.D.,  Ted 
Matson,  Kathleen  McCormick,  Ph.D.,  Pat  Merriweather,  R.  Heather  Palmer,  M.D., 
Elizabeth  Pappius,  Lois  Quam,  David  Rollo,  M.D.,  Ph.D.,  Bruce  Sams,  M.D.,  Mary  Sajdak, 
Marilyn  Schlein,  Richard  Sharpe,  Albert  Siu,  M.D.,  Beau  Stubblefield,  Jonathan  Sunshine, 
Ph.D.,  I.  Steven  Udvarhelyi,  M.D.,  Gordon  Vineyard,  M.D.,  and  Jonathan  Weiner,  Ph.D.. 


67 


REFERENCES 


Alsip,  J.,  "Quality  of  Care:  A  Composite  Measure,"  Iowa  Foundation  for  Medical  Care, 
October  1990. 


AHA  Task  Force  Report,  "Guidelines  for  Implantation  of  Cardiac  Pacemakers  and 
Antiarrhythmia  Devices,"  Journal  of  the  American  College  of  Cardiology  18:1-13,  1991. 

Avorn,  J.,  P.  Dreyer,  K.  Connelly,  et  al.,  "Use  of  Psychoactive  Medication  and  the  Quality 
of  Care  in  Rest  Homes:  Findings  and  Policy  Implications  of  a  Statewide  Study,"  The 
New  England  Journal  of  Medicine  320:227-232,  1989. 

Avorn,  J.  and  S.B.  Soumerai,  "Improving  Drug-Therapy  Decisions  Through  Educational 
Outreach:  A  Randomized  Controlled  Trial  of  Academically  Based  'Detailing',"  The 
New  England  Journal  of  Medicine  308:1457-1463,  1983. 

Geary,  P.D.,  S.  Greenfield,  A.G.  Mulley,  et  al.,  "Variations  in  Length  of  Stay  and 
Outcomes  for  Six  Medical  and  Surgical  Conditions  in  Massachusetts  and  California," 
Journal  of  the  American  Medical  Association  266:73-79,  1991. 

Codman,  E.A.,  "Hospital  Standardization,"  Surgery,  Gynecology,  and  Obstetrics  22:119,  1916. 

Garnick,  D.W.,  A.M.  Hendricks,  and  T.A.  Brennan,  "Can  Practice  Guidelines  Reduce  the 
Number  and  Costs  of  Malpractice  Claims?"  Journal  of  the  American  Medical 
Association  266:2856-2860,  1991. 

Garnick,  D.W.,  H.S.  Luft,  L.B.  Gardner,  et  al.,  "Services  and  Charges  by  PPO  Physicians 
for  PPO  and  Indemnity  Patients,"  Medical  Care  28:894-906,  1990. 

Grady,  K.F.,  "Physician  Utilization  Profiling:  The  Key  to  Managing  Ambulatory 
Utilization,"  in  Peter  Boland,  ed.,  Making  Managed  Healthcare  Work:  A  Practical  Guide 
to  Strategies  and  Solutions  (New  York:  McGraw  Hill  Inc.,  1991),  pp.394-99. 

Hannan,  E.L.,  H.  Kilburn,  H.  Bernard,  et  al.,  "Coronary  Artery  Bypass  Surgery:  The 
Relationship  Between  In-Hospital  Mortality  Rate  and  Surgical  Volume  After 
Controlling  for  Clinical  Risk  Factors,"  Medical  Care  29:1094-1107,  1991. 

Hirshfeld,  E.B.,  "Should  Practice  Parameters  be  the  Standard  of  Care  in  Malpractice 
Litigation?"  Journal  of  the  American  Medical  Association  266:2886-2891,  1991. 


68 


Jencks,  S.F.,  J.  Daley,  D.  Draper,  et  al.,  "Interpreting  Hospital  Mortality  Data:  The  Role 
of  Clinical  Risk  Adjustment,"  Journal  of  the  American  Medical  Association  260:3611- 
3616,  1988. 

Lomas,  J.,  G.M.  Anderson,  K.  Domminick-Pierre,  et  al.,  "Do  Practice  Guidelines  Guide 
Practice  of  Physicians?"  The  New  England  Journal  of  Medicine  321:1306-1311,  1989. 

Luft,  H.,  unpublished  data,  1991. 

Luft,  H.L.  and  S.S.  Hunt,  "Evaluating  Individual  Hospital  Quality  Through  Outcome 
Statistics,"  Journal  of  the  American  Medical  Association  255:2780-2784,  1986. 

McCoy,  C.E.,  J.  Fowles,  A.M.  Pheley,  et  al.,  "The  Feasibility  of  Using  Administrative 
Claims  Data  for  Quality  Assessment  of  Diabetes  Care  in  a  Health  Maintenance 
Organization,"  in  preparation,  1992. 

Palmer,  R.H.,  TA.  Louis,  L.N.  Hsu,  et  al.,  "A  Randomized  Controlled  Trial  of  Quality 
Assurance  in  Sixteen  Ambulatory  Care  Practices,"  Medical  Care  23:751-770,  1985. 

Palmer,  R.H.,  J.  Weiner,  et  al.,  DEMPAQ  study,  ongoing,  1992. 

Sanazaro,  P.H.  and  R.M.  Worth,  "Measuring  Clinical  Performance  of  Individual  Internists 
in  Office  and  Hospital  Practice,"  Medical  Care  23:1097-1114,  1985. 

Soumerai,  S.B.,  J.  Avorn,  D.  Ross-Degnan,  et  al.,  "Payment  Restrictions  for  Prescription 
Drugs  under  Medicaid:  Effects  on  Therapy,  Cost,  and  Equity,"  The  New  England 
Journal  of  Medicine  317:550-556,  1987. 

Soumerai,  S.B.,  D.  Ross-Degnan,  J.  Avorn,  et  al.,  "Effects  of  Medicaid  Drug-Payment 
Limits  on  Admission  to  Hospitals  and  Nursing  Homes,"  The  New  England  Journal  of 
Medicine  325:1072-1077,  1991. 

Steardt,  A.L.,  S.  Greenfield,  R.D.  Hays,  et  al.,  "Functional  Status  and  Well  Being  of 
Patients  with  Chronic  Conditions:  Results  from  the  Medical  Outcomes  Study," 
Journal  of  the  American  Medical  Association  262:907-913,  1989. 

Steinwachs,  D.M.,  J.P.  Weiner,  and  S.  Shapiro,  "Management  Information  Systems  and 
Quality,"  in  N.  Goldfield  and  D.B.  Nash,  Providing  Quality  Care:  The  Challenge  to 
Clinicians  (Philadelphia:  American  College  of  Physicians,  1989),  pp.  160-180. 

Weiner,  J.,  unpublished  data,  1992. 


69 


Weiner,  J.P.,  "Ambulatory  Case-Mix  Methodologies:    Application  to  Primary  Care 
Research,"  in  H.  Hibbard,  PA.  Nutting,  and  M.L.  Grady,  eds.,  Conference  Proceedings 
Primary  Care  Research:  Theory  and  Practice  (Washington,  DC:  DHHS,  1991),  pp. 
75-81. 

Weiner,  J.P.,  N.R.  Powe,  D.M.  Steinwachs,  et  al.,  "Applying  Insurance  Claims  Data  to 
Assess  Quality  of  Care:  A  Complication  of  Potential  Indicators,"  Quality  Review 
Bulletin  16:424-438,  1990. 

Vibbert,  S.,  "Judge  Orders  NY  to  Release  Physician-Specific  Death  Rates,"  Medical 
Utilization  Review  19:1,  1991. 


70 


PAPER  NO.  4 


IMPACT  OF  PROFILES  ON  MEDICAL  PRACTICE 


Authors: 


Stephen  C.  Schoenbaum,  MD,  MPH 
Katherine  Oates  Murrey 


Address: 


Harvard  Community  Health  Plan 
10  Brookline  Place  West 
Brookline,  MA  02146 


IMPACT  OF  PROFILES  ON  MEDICAL  PRACTICE 


INTRODUCTION 

The  ultimate  objectives  of  profiling  are  to  stimulate  physicians  to  utilize  services 
economically,  to  help  physicians  and  others  assess  and  improve  the  quality  of  care,  and  to 
enable  payers  to  identify  and  manage  "outlier"  physicians  and  the  cost-effectiveness  of  care 
provided  by  the  "average"  physician  -  in  other  words,  to  generate  actions. 

This  paper  is  concerned  with  the  mechanisms  by  which  profile  information  can  be 
translated  into  actions.  We  assume  that  it  is  not  worth  obtaining  or  analyzing  profile  data 
unless  there  is  an  underlying  hypothesis  or  objective  which  is  action-oriented;  and  we  begin 
the  discussion  with  an  analysis  of  information  from  profiles  and  how  profiles  would  be 
expected  to  change  if  practices  changed. 

Next,  and  for  much  of  the  paper,  we  turn  to  the  ways  in  which  information  from  profiles 
coupled  to  management  interventions  can  affect  practice.  Most  often,  the  first  step  in  the 
chain  from  obtaining  performance  profiles  to  affecting  clinical  practice  involves  feedback 
of  the  performance  data.  Feedback  can  be  a  management  intervention  in  and  of  itself. 
Feedback  also  can  be  linked  to  other  interventions.  We  shall  review  results  which  have 
been  obtained  from  the  feedback  of  practice  profiles,  and  we  shall  review  factors 
associated  with  effective  feedback. 

The  ultimate  objective  of  practice  improvement  is  to  obtain  the  best  outcomes  with  the 
most  efficient  processes.  Both  the  notions  of  "best  outcomes"  and  "most  efficient 
processes"  are  moving  targets,  and  each  is  extremely  difficult  to  quantify.  The  very  nature 
of  the  data,  such  as  whether  it  is  process  or  outcome  that  is  being  measured,  interacts  with 
how  the  data  can  and  should  be  used  to  affect  practice. 

Profiles  can  be  used  to  stimulate  the  recipients  of  feedback  information  to  devise 
performance  improvements;  i.e.,  the  data  can  be  used  as  a  "carrot".  Alternatively,  the  data 
can  be  part  of  a  structure  designed  to  regulate  performance,  i.e.,  a  "stick".  When  feedback 
is  used  as  a  carrot,  the  information  must  be  actionable;  or,  there  must  be  an  explicit 
process  for  proceeding  from  the  available  information  to  actionable  information.  We  shall 
discuss  whether  the  carrot  is  better  than  the  stick  and  how  one  can  manage  improvements 
in  the  quality  of  clinical  practice. 

The  entire  discussion  of  this  paper  whether  it  be  about  profiling,  feedback,  utilization 
review,  quality  assurance,  or  quality  management,  centers  on  the  concept  that  the  physician 
is  the  most  important  party  in  understanding  and  improving  quality  of  care.  Physicians, 


72 


having  been  given  this  central  role  in  a  cast  which  includes  many  other  important  players, 
have  an  extremely  significant  responsibility  individually  and  collectively  for  achieving  better 
and  more  efficient  care.  We  conclude  the  paper  with  a  discussion  of  how  best  to  help 
physicians  play  this  role  now  and  in  the  future. 

DOCUMENTING  PERFORMANCE  VARIATION 

The  performance  of  an  individual  physician  or  medical  institution  has  no  meaning  unless 
it  can  be  compared  to  an  absolute  performance  norm  or  to  the  performance  of  others. 
When  an  aspect  of  performance  has  been  measured  for  each  of  a  group  of  individuals  such 
as  medical  practitioners,  one  can  describe  a  distribution  of  performance.  Since  virtually 
never  do  all  practitioners  do  exactly  the  same  thing,  the  distribution  of  performance  can 
be  expected  to  demonstrate  "variation".  Although  the  existence  of  such  variation  may  be 
of  academic  interest,  the  ultimate  objective  of  developing  profiles  and  performing  the 
measurements  must  be  to  improve  medical  practice;  i.e.,  to  stimulate  action.  The  action 
involves  linking  the  profile  information  to  specific  management  interventions  which,  in 
turn,  can  affect  clinical  performance. 

There  are  three  ways  in  which  profiling,  coupled  with  suitable  management  interventions, 
can  affect  the  distribution  of  performance  and  have  a  positive  impact  on  medical  practice. 
These  are:  1)  reduction  of  unnecessary  or  inappropriate  variation,  2)  enhancement  of  the 
overall  level  of  performance,  and  3)  elimination  of  unacceptable  performance. 

Reduction  of  Unnecessary  or  Inappropriate  Variation 

There  are  many  sources  of  variation  in  the  performance  of  clinicians,  and  by  no  means  is 
all  of  the  variation  due  to  differences  in  physician  preference  or  style  or  physician  skills. 
Physicians  practice  upon  different  patient  populations  in  a  variety  of  individual  or 
institutional  settings.  Population  differences  may  be  due  to  "case  mix"  (i.e.,  real  differences 
in  severity  of  illness);  or  they  may  be  due  to  differences  in  patient  preferences  (e.g., 
choices  between  practices  which  improve  quality  of  life  vs  quantity  of  life). 

Population  differences  generally  lead  to  appropriate  or  justifiable  variation.  In  contrast, 
after  adjustment  for  population  differences,  residual  variation  in  performance  between 
practitioners  or  between  institutions  may  not  be  appropriate.  Reduction  of  inappropriate 
or  unnecessary  variation  is  equivalent  to  narrowing  the  original  distribution,  or,  in  other 
words,  getting  closer  to  a  situation  in  which  everyone  does  do  the  same  thing  (Figure  4- 
1A). 

For  example,  individual  primary  care  physicians  in  a  very  large  group  such  as  a  large  staff 
or  group  model  HMO  may  have  very  different  rates  of  referral  of  patients  with  gynecologic 
problems  for  ultrasound  studies  or  to  gynecologists  despite  the  fact  that  the  patients  they 


73 


are  seeing  are  very  similar.  On  further  examination,  it  might  turn  out  that  some  are 
under-referring  and  others  are  over-referring  (errors  of  omission  and  commission, 
respectively).  An  intervention  might  address  the  conditions  under  which  such  patients 
should  be  referred  and  might  try  to  give  all  physicians  the  same  knowledge  base  as  those 
with  the  "appropriate"  rate  of  referral.  Such  an  intervention,  if  successful,  would  be 
expected  to  narrow  the  distribution  of  performance  around  the  mean. 

Enhancement  of  the  Overall  Level  of  Performance 

Another  intervention  approach  which  could  be  taken  in  the  same  sort  of  health  care 
organization  would  be  to  teach  primary  care  clinicians  a  skill  or  procedure  which  is  usually 
performed  by  gynecologists.  This  would  shift  the  distribution  of  performance  towards  a 
level  which  the  organization  would  consider  "better"  (Figure  4- IB).  There  would 
undoubtedly  be  a  shift  in  the  mean  referral  rate  for  the  procedure  even  though  there  might 
still  be  considerable  variation  around  the  mean. 

Elimination  of  Unacceptable  Performance 

A  careful  examination  of  physician  performance  might  reveal  that  certain  practitioners,  for 
whatever  reason,  simply  are  not  performing  at  an  acceptable  level.  One  might  choose  to 
eliminate  such  poor  performance.  For  example,  one  might  not  renew  these  physicians' 
credentials  to  perform  certain  procedures.  Elimination  of  outliers  or  "bad  apples"  is 
equivalent  to  cutting  off  one  tail  of  the  distribution,  the  tail  which  represents  performance 
below  an  acceptable  threshold  or  standard  (Figure  4-1C). 

Combined  Interventions 

The  three  types  of  improvement  are  mutually  compatible.  One  can  both  narrow  the 
overall  distribution  and  shift  the  mean  to  a  better  level  (Figure  4- ID).  Indeed,  as  one 
shifts  performance  to  a  better  level,  especially  as  the  mean  gets  closer  to  an  absolute  norm 
of  0  or  100  percent  performance  of  a  practice,  the  distribution  almost  certainly  will  narrow. 
The  net  result  is  that  the  entire  distribution  may  lie  above  the  threshold  of  unacceptable 
performance,  and  this  will  obviate  the  need  to  eliminate  "bad  apples".  Alternatively,  with 
a  combination  of  interventions,  one  could  eliminate  the  "bad  apples",  shift  the  mean  level 
of  performance  and  narrow  the  overall  distribution  (also  Figure  4- ID). 

The  crucial  point  is  to  understand  that  distributions  usually  do  not  change  on  their  own. 
They  change  because  someone  has  designed  and  implemented  an  intervention.  As  we  shall 
see,  regulatory  or  "stick"  interventions  are  usually  designed  to  affect  outliers;  whereas 
"carrot"  interventions  may  affect  outliers  but  also  can  narrow  the  distribution  and  shift  the 
mean  performance  level. 


74 


Necessary  Steps  in  Translating  Profiles  into  Opportunities  for  Practice  Improvement 

An  important  link  in  the  chain  from  the  development  of  information  on  performance  (e.g., 
a  practice  profile)  to  the  design  of  action  to  improve  performance  is  first  to  look  at  the 
data  (i.e.,  to  plot  them).  Figures  4-1  (a  through  d)  represent  hypothetical  distributions  of 
practice  performance.  Rarely  does  one  see  actual  plots  of  practice  performance.  Although 
it  is  commonly  assumed  that  most  practice  performance  data  have  an  underlying  normal 
distribution,  and  means  and  standard  deviations  are  calculated  on  that  basis,  it  is  quite 
possible  that  some  distributions  are  highly  skewed,  others  bimodal,  etc. 

Examining  a  distribution  is  only  the  beginning.  If  one  is  going  to  act  responsibly  upon 
performance  data  one  must  understand  whether  the  data  represent  a  process  which  is  in 
"statistical  control"  (Deming  1986).  Figure  4-2  shows  information  on  cesarean  section  rates 
for  two  HMOs.  Even  though  initially  the  HMO  with  the  higher  rate  might  have  argued 
that  the  difference  was  not  meaningful,  the  apparent  stability  of  the  performance  in  each 
HMO  over  time  suggests  that  if  the  two  organizations  are  serving  similar  populations  and 
the  data  have  been  collected  by  comparable  methods,  there  must  be  a  true  difference  in 
the  process  of  care  between  them.  Our  judgment  of  the  stability  of  performance  in  these 
two  organizations  is  being  made  informally,  just  by  visual  inspection  of  the  data.  Many 
factors,  including  the  nature  of  the  patient  population  and  sample  size,  will  affect  the 
stability  of  profile  observations.  It  is  possible  to  devise  a  formal  "control  chart"  to 
demonstrate  whether  the  performance  falls  within  statistical  limits;  and  once  it  is  apparent 
that  the  performance  is  stable  or  in  statistical  control,  it  is  reasonable  to  assess  the 
underlying  processes  which  might  lead  to  the  differential  performance. 

When  practice  profiles  are  examined,  the  usual  assumption  is  that  each  point  of  the 
distribution  does  represent  an  individual  or  institution  "in  control".  This  assumption  is 
embodied  in  the  term  "bad  apple".  It  would  make  no  sense  to  consider  an  outlier 
physician  a  "bad  apple"  if  one  thought  that  the  observed  performance  was  unstable  and 
that  without  intervention  the  physician's  performance  would  be  observed  to  come  into  an 
acceptable  range. 

We  stated  at  the  outset  that  it  is  not  worth  obtaining  or  analyzing  profile  data  unless  there 
is  an  underlying  hypothesis  or  objective  which  is  action-oriented.  We  recognize,  however, 
that  it  has  been  common  up  to  now  to  translate  existing  data  sets  into  profiles  simply 
because  the  data  sets  were  available.  As  the  above  discussion  indicates,  we  need  to 
become  more  sophisticated  in  our  approaches  to  profiling  medical  practices.  Until  then, 
we  will  have  difficulty  understanding  how  much  of  the  variation  we  observe  is  due  to 
practice  process  differences  which  are  tractable  to  improvement.  This  simple  fact  needs 
to  be  kept  in  mind  as  we  consider  the  effects  of  simply  feeding  the  practice  information 
back  to  practitioners. 


75 


#MDs 


Figure  1.  Effects  of  Interventions  on  Performance  Distribution 


After  the  intervention 


Before  the  intervention         After  the  intervention 


#MDs 


Performance 


Worse 


Better 


Performance 


1A.  Reduction  of  unnecessary  variation. 

Intervention  narrows  the  distribution  around 
the  mean. 


IB.  Shift  to  better  practice. 

Intervention  shifts  the  distribution  towards  a 
better  mean  performance 


#MDs 


unacceptable 
 >- 


Worse 


Better 


#MDs 


Worse 


Better 


Performance 


Performance 


1C.  Elimination  of  unacceptable  performance. 

Intervention  results  in  a  truncated  distribution 
of  acceptable  performers  only. 


ID.  Combined  effects. 

Interventions  yield  a  better  mean  level  of 
performance,  a  narrower  distribution  around 
the  mean,  and  acceptable  performers  only. 


76 


Figure  2.  Cesarean  Section  Rates  in  Two  HMOs,  1990-1991 


10 
5 


2  3  4  1 

1990  1991 
Quarter 


FEEDBACK  AS  AN  INTERVENTION  TO  IMPROVE  PRACTICE 

In  most  instances,  as  an  initial  intervention  to  obtain  a  positive  effect  of  profiles  on  the 
quality  of  medical  practice  it  has  appeared  desirable  to  feedback  the  information  to 
practitioners  or  institutions.  There  are  several  ways  to  accomplish  this. 

Feedback  Formats 

Peer  comparison  feedback  consists  of  reporting  to  an  individual  his  or  her  own 
performance  data  along  with  relevant  comparative  performance  data  of  others.  Grouped 
or  aggregate  feedback,  in  contrast,  involves  providing  individuals  with  information  on 
performance  of  the  group  of  which  they  are  a  part,  but  not  individual  information. 
Individual  feedback  consists  of  providing  individual  clinicians  or  institutions  information 
about  their  own  performance  without  providing  comparative  information.  This  last  type 
of  feedback  is  meaningful  only  if  there  is  a  clear  norm  for  performance.  In  short,  feedback 


77 


uses  individuals  or  groups  as  the  unit  of  aggregation  of  data  and  compares  individual  or 
group  performance  to  either  normative  standards  or  practice-based  standards. 

Feedback  Functions 

Besides  coming  in  several  different  formats,  performance  feedback  can  also  serve  a  variety 
of  different  functions.  It  can  be  a  carrot  or  a  stick:  Performance  feedback  can  be 
educational  to  the  recipient.  It  can  also  serve  to  remind  or  prompt  practitioners  to 
perform  certain  tasks  or  processes.  It  can  function  as  a  positive  incentive,  leading  or 
guiding  performance,  and  providing  a  reward  as  "the  numbers"  get  better;  or,  it  can  be  a 
negative  incentive  and  be  perceived  as  a  punishment  for  poor  performance.  Each  of  these 
functions  of  performance  feedback  is  known  to  be,  on  its  own,  a  modifier  of  physician 
performance  (Eisenberg  1986).  In  addition,  performance  feedback  can  be  employed  in 
combination  with  other  performance  modifiers  including  programs  which  include  specific 
educational  interventions,  reminders,  prompts,  notifications,  rewards  and  punishments. 

Review  of  Feedback  of  Profiles  (Appendix) 

We  are  going  to  explore  the  mechanics  of  performance  feedback  and  how  it  might  be 
employed  most  usefully  to  improve  practice  performance.  The  information  comes  from 
a  review  of  the  literature,  personal  experience  and  the  experience  of  a  selected  group  of 
individuals  known  to  the  authors  to  have  been  involved  in  activities  of  profiling  and 
feedback. 

The  Appendix  consists  of  an  annotated  bibliography  of  the  studies  we  have  located  in 
which  feedback  of  profile  information  or  performance  feedback  has  been  an  intervention 
(Appendix  4A;  summarized  in  Table  4A).  Information  from  sources  other  than  practice 
profiles  can  also  lead  to  feedback  as  an  intervention;  and  we  have  included  examples  from 
this  non-profile  experience  in  examining  the  effects  of  feedback. 

Factors  Affecting  the  Success  of  Feedback  of  Profiles 

From  the  48  studies  referenced  and  annotated  in  the  Appendix,  one  can  state  that 
feedback  can,  indeed  usually  does,  successfully  improve  performance  (81  percent  of  all 
studies  reviewed,  79  percent  of  all  randomized  controlled  trials  reviewed).  Success  has 
been  reported  for  studies  involving  peer  comparison  feedback  (a  positive  result  in  80 
percent  of  controlled  trials  using  this  methodology,  4/5)  individual  feedback  (also  80 
percent  of  controlled  trials,  8/10),  aggregate  feedback  (67  percent  of  controlled  trials,  2/3) 
and  mixtures  of  types  of  feedback  (75  percent  of  controlled  trials,  6/8).  Success  has  been 
reported  in  most  studies  involving  practicing  physicians  alone  (83  percent,  20/24),  most 
involving  house  staff  alone  (82  percent,  14/17),  and  most  involving  attending  physicians 
along  with  house  staff  (71  percent,  5/7).  Feedback  appears  to  have  been  successful  when 
used  alone  (a  positive  result  in  70  percent  of  controlled  trials,  7/10)  and  slightly  more 


78 


successful  when  used  in  combination  with  some  other  performance  enhancer  such  as 
education,  rewards  or  opinion  leaders  (80  percent  of  controlled  trials,  12/15).  Feedback 
has  been  effective  when  given  less  than  one  month  after  the  event  (79  percent  of 
controlled  trials,  15/19)  or  more  than  one  month  after  the  event  (71  percent  of  controlled 
trials,  5/7).  Finally,  feedback  has  been  effective  when  the  objective  has  been  to  reduce 
utilization  or  cost  (68  percent  of  controlled  trials,  13/19)  but  even  more  effective  (100 
percent  of  controlled  trials,  7/7)  when  the  objective  has  been  to  enhance  utilization 
(usually  immunization  or  screening  practices). 

Although  the  above  evidence  indicates  without  question  that  feedback  can  be  effective  in 
improving  clinical  performance,  the  reader  should  be  aware  of  several  important  points: 
Firstly,  the  literature  may  selectively  include  studies  with  a  positive  result,  so-called 
"publication  bias"  (Fineberg  1985).  Authors  may  be  reluctant  to  submit  and  editors  may 
be  reluctant  to  publish  negative  uncontrolled  studies.  Authors  may  even  be  reluctant  to 
submit  and  editors  may  be  reluctant  to  publish  negative  controlled  trials  since  they  may 
have  been  designed  with  relatively  small  sample  sizes  and  lack  sufficient  power  to  have  a 
convincingly  negative  result  (Freiman  et  al.  1978).  Secondly,  it  is  essentially  unheard  of 
to  attain  100  percent  performance  with  feedback  alone  or  in  combination  with  non-punitive 
measures.  Accordingly,  success,  even  when  it  occurs  and  is  reported,  is  not  total.  Thirdly, 
despite  the  many  reports  from  the  literature  summarized  in  the  Appendix,  many 
interesting  questions  remain  unanswered  and  even  unilluminated  by  the  reports.  The 
literature  describes  what  has  been  done  and  whether  it  has  worked.  Only  a  few  authors 
have  speculated  on  why  feedback  works,  and  we  shall  review  some  of  these  notions.  There 
are  virtually  no  formal  investigations  to  determine  why  feedback  works  so  well  or  so  poorly 
at  times.  As  one  examines  negative  reports  or  studies,  it  is  striking  that  the  reasons  for 
obtaining  negative  results  are  entirely  speculative  and  even  contradicted  by  some  positive 
studies.  For  example,  it  is  difficult  to  argue  that  a  study  which  attempted  to  decrease 
ordering  of  expensive  tests  failed  simply  because  it  was  cost-oriented  when  a  substantial 
percentage  of  cost-oriented  studies  have  yielded  a  positive  result. 

Obtaining  "Buy  In"  to  Profile  Information  or  Feedback 

Most  authors  and  other  students  of  feedback  argue  that  for  profile  information  to  have  an 
impact  on  practice,  the  physician  must  "buy  in"  to  the  profile.  Physicians  are  most  likely 
to  "buy  in"  if  the  information  in  the  profile  is  meaningful  to  them;  and  enhancing  the 
meaningfulness  of  profile  information  and  feedback  involves  content  issues  and  process 
issues. 

Content  Issues  in  Enhancing  the  Meaningfulness  of  Profile  Information  and  Feedback 

There  are  several  issues  concerning  the  content  of  profile  data  which  relate  to  the 
meaningfulness  of  the  information  and  effectiveness  of  its  feedback.  These  issues  are 
related  to  the  statistical  stability  and  the  actionability  of  the  information.  Statistical 


79 


stability  or  control  is  a  concept  which  we  introduced  previously,  in  the  discussion  of 
performance  distributions.  The  actionability  of  profile  data  relates  to  the  closeness  of  the 
information  presented  to  an  action  step  which  might  improve  performance. 

Patients  seek  medical  care  to  solve  problems.  At  the  direction  of  physicians,  they  undergo 
processes-of-care  which  ultimately  yield  outcomes.  Information  pertaining  to  processes-of- 
care  is  more  likely  to  be  acted  upon  than  outcome  information.  Extreme  examples  of  this 
would  be  HCFA  mortality  data  (profiled  outcome-of-care  information)  versus  information 
on  potential  drug  incompatibilities  for  individual  patients  reported  to  individual  physicians 
(non-profiled  process-of-care  information).  When  HCFA  first  released  mortality  data  for 
hospitals  it  engendered  an  enormous  debate  over  the  meaningfulness  of  the  data  and 
relatively  little  if  any  action  on  the  part  of  hospitals  to  understand  their  own  outcomes  or 
to  improve  them.  Many  questioned  whether  the  data  on  different  hospitals  took  into 
account  population  differences.  Thus,  they  were  actually  questioning  whether  the 
differences  in  outcome  reflected  differences  in  process  of  care.  One  can  speculate, 
however,  that  had  HCFA  done  an  even  better  job  at  the  outset  of  adjusting  the  data  for 
population  differences,  it  would  have  been  quite  difficult  to  act  upon  the  differences  in 
outcome.  The  data  alone  do  not  indicate  an  appropriate  action  step.  HCFA  never 
convened  different  institutions  in  an  attempt  to  discover  what  process  differences  might 
underlie  the  outcome  differences;  and  only  such  an  exploration  would  be  likely  to  lead  to 
action  (see  the  discussion  below  on  "The  'Convener'  Function  as  a  Step  in  Understanding 
Processes-of-Care") . 

A  recent  series  of  papers  has  presented  evidence  of  differences  in  outcome  of  coronary 
bypass  surgery  across  institutions  which  appear  to  be  credible  (O'Connor  et  al.  1991; 
Williams  et  al.  1991).  These  papers,  however,  simply  take  one  to  the  starting  line  for 
exploration  of  action  (Berwick  1991). 

In  contrast,  when  information  is  obtained  from  surveillance  of  pharmacy  prescriptions  or 
claims  and  physicians  are  told  about  potential  drug  incompatibilities,  there  is  a  change  in 
orders  65  percent  of  the  time  (Jonathan  Edelson,  MD,  ValueRx,  Inc.,  personal 
communication).  In  this  instance  the  physicians  appear  to  "buy  in"  to  the  information, 
accept  it  as  having  face  validity,  and  take  action.  Interestingly,  they  do  so  even  though  in 
almost  all  instances  they  have  not  participated  in  the  development  of  the  information.  The 
action  is  relatively  simple  and  is  obvious  from  the  information  which  is  fed  back. 
Presumably,  all  the  physician  needs  to  do  is  examine  the  patient's  record  and  the 
medications  ordered,  determine  if  there  truly  are  incompatibilities,  and  then  either 
discontinue  or  change  one  of  the  medications. 

It  is  indisputable  that  one  way  to  achieve  "buy  in"  is  to  employ  well-accepted  or  acceptable 
scientific  principles.  This  becomes  a  rationale  for  basing  profiles  on  practices  or  practice 
guidelines  which,  in  turn,  are  supported  by  the  scientific  literature  or  by  accepted  clinical 
rules.  This  simple  rationale  means  there  is  an  explicit  link  between  actionable  profile 


80 


information,  demonstrations  of  medical  effectiveness,  and  clinical  guidelines.  Much  work 
is  currently  being  done  to  develop  clinical  guidelines,  but  so  far  there  have  been  few 
demonstrations  of  their  effectiveness.  One  would  predict  that  the  effectiveness  of  the 
guidelines  will  depend  on  how  closely  they  recommend  specific  actions  and  the  degree  to 
which  the  actions  derive  from  demonstrations  of  medical  effectiveness  or  from  the 
development  of  a  broad  consensus  among  physicians. 

When  profile  information  does  not  derive  from  a  scientific  base  its  content  value  still  may 
be  enhanced  by  making  sure  that  it  is  as  free  as  possible  from  measurement  error. 
Criticism  seems  to  be  a  first  response  to  profiles,  not  just  in  the  case  of  the  HCFA 
mortality  data.  Information  from  profiles  will  be  less  likely  to  be  criticized  and  more  likely 
to  be  acted  upon  if  it  is  credible,  i.e.,  not  fraught  with  obvious  errors.  This  is  often  a 
difficult  condition  to  meet  given  the  quality  of  information  about  clinical  practice. 
Nevertheless,  meeting  it  is  worthwhile.  Thus,  for  example,  surgeons  presented  with  data 
on  length  of  stay  are  more  likely  to  act  upon  it  if  they  know  that  the  lengths  of  stay 
attributed  to  them  represent  patients  truly  under  their  control  rather  than  stays  acquired 
under  the  control  of  other  attending  physicians  before  transfer  to  the  surgical  service 
(Sarah  Pedersen,  Center  for  Cost-Effective  Care,  Brigham  and  Women's  Hospital, 
personal  communication). 

Another  way  to  improve  the  usefulness  of  practice  information  and  make  it  more 
actionable  is  to  provide  "benchmarks".  David  T.  Kearns,  chief  executive  office  of  Xerox 
Corporation  has  defined  benchmarking  as  "the  continuous  process  of  measuring  products, 
services,  and  practices  against  the  toughest  competitors  or  those  companies  recognized  as 
industry  leaders"  (Camp  1989).  The  availability  of  benchmark  data  from  other  institutions 
or  settings  gives  a  context  for  interpretation  of  one's  own  data.  Even  though  the  formal 
studies  cited  in  the  Appendix  do  not  provide  evidence  that  peer  comparison  was 
necessarily  more  effective  than  other  types  of  feedback,  many  hold  the  opinion  that  it  is 
important  to  provide  comparison  or  context.  The  notion  of  benchmarking  includes 
providing  information  on  optimal  performance  to  stimulate  individuals  or  groups  to 
improve. 

When  we  noted  earlier  the  desirability  of  presenting  data  in  a  time  series  or  control  chart 
we  essentially  presented  an  example  of  increasing  the  meaningfulness  of  profile 
information  by  demonstrating  its  statistical  stability.  The  reader  will  recall  that  Figure  4-2 
shows  information  on  cesarean  section  rates  for  two  HMOs.  We  concluded  that  if  the  two 
organizations  were  serving  similar  populations  there  must  be  a  difference  between  them 
in  the  process-of-care.  The  information  in  Figure  4-2.  also  illustrates  the  points  we  have 
been  making  about  actionability.  Information  on  cesarean  section  rates  alone,  just  like 
information  on  coronary  bypass  mortality,  does  not  dictate  an  action.  Once  one  has 
concluded  that  the  data  are  stable  and  there  is  a  difference  worth  addressing,  one  must 
begin  to  look  more  specifically  at  actionable  process  steps.  For  example,  one  could 
examine  the  rates  of  performing  vaginal  births  after  cesarean  section,  the  so-called  VBAC 


81 


rate.  Since  most,  but  not  all,  women  who  have  had  a  prior  cesarean  section  can  have  a 
VBAC  or  at  least  trial  of  labor,  a  low  rate  of  VBACs  or  trials  of  labor  would  indicate  that 
the  repeat  cesarean  section  rate  is  higher  than  it  should  be.  These  data  should  be  more 
meaningful  to  clinicians.  The  problem  is  that  it  is  much  harder  to  obtain  profile 
information  on  VBACs  or  trials  of  labor  than  it  is  to  obtain  overall  cesarean  section  rates. 

There  also  may  be  a  trade-off  between  obtaining  actionable  information  and  obtaining 
statistically  stable  information.  For  example,  if  one  tries  to  calculate  VBAC  rates  by 
obstetrician,  the  numbers  may  be  very  small  and  the  rates  unstable.  In  such  a  situation  it 
may  be  better  to  collect  and  present  information  on  a  group  of  physicians  rather  than 
individuals.  Theoretically,  such  information  may  still  be  actionable:  If  the  group  performs 
much  worse  than  a  scientific  norm  or  empirical  benchmark,  then  all  in  the  group  should 
conclude  that  their  individual  performance  could  improve.  In  one  controlled  trial, 
however,  that  precise  strategy  did  not  work  (Lomas  et  al.  1991). 

At  times  there  may  be  a  difference  between  increasing  the  meaningfulness  of  the  content 
to  physicians  and  providing  data  which  are  most  useful  to  the  health  care  system  or  its 
customers.  For  example,  one  might  choose  to  feedback  to  primary  care  physicians  the 
rates  at  which  they  order  mammograms  on  women  in  their  practice  over  age  50  who  have 
had  a  visit  within  the  last  year  or  to  feedback  the  rates  at  which  women  in  their  practice 
have  actually  received  a  mammogram.  The  former  figure  may  be  easier  for  the  physician 
to  "buy  into"  since  it  represents  the  portion  of  the  process  of  breast  cancer  screening  over 
which  the  primary  physician  has  greatest  control.  Yet,  it  is  the  latter  figure  that  represents 
the  bottom  line  of  performance  of  the  practice.  Ultimately  it  is  important  to  get  physicians 
to  "buy  into"  overall  performance  measures  whenever  possible.  The  physician  may  not  be 
responsible  for  most  issues  of  patient  non-compliance,  but  the  physician  can  be  extremely 
helpful  in  determining  that  non-compliance  is  the  principal  problem.  The  physician  can 
also  be  extremely  helpful  in  working  out  the  strategies  which  best  address  the  problem. 
With  the  current  emphasis  on  quality  management  in  health  care,  physicians  need  to 
become  increasingly  aware  that  even  though  they  may  not  have  total  control  over  a  care 
process  they  can  and  must  play  an  important  role  in  improving  it.  We  shall  return  to  this 
point  later  when  we  discuss  the  issues  of  physician  autonomy  and  participation  in  clinical 
quality  management  (see  below,  "The  Physician  and  Quality  Management"). 

In  this  section  of  the  discussion  we  have  developed  the  idea  that  the  content  of  profile 
information  influences  the  effectiveness  of  feedback  of  the  information.  The  examples  we 
have  given  demonstrate  that  the  issues  of  stability  and  actionability  of  the  data  can  be 
independent  or  interrelated. 


82 


Process  Issues  in  Enhancing  the  Meaningfulness  of  Profile  Information  and  Feedback 

We  are  going  to  discuss  three  process  issues  which  may  be  related  to  the  effectiveness  of 
feedback.  These  are  participation  in  the  development  of  profiles,  interaction  in  the 
feedback  step,  and  the  timing  of  feedback. 

Although  it  may  not  be  absolutely  necessary  to  enlist  the  participation  of  physicians  in  the 
,  development  of  profiles  and  feedback  information,  it  is  highly  desirable  to  do  so.  This  is 
particularly  true  if  the  profile  information  does  not  derive  from  a  scientifically 
demonstrated  medical  "fact".  Participation,  or  involvement,  is  almost  universally  believed 
to  be  a  process  which  improves  the  meaningfulness  of  the  information  or  its  presentation. 
If  the  information  is  not  dictated  by  science,  in  many  instances  physician  involvement  will 
actually  determine  or  alter  the  content  of  the  information  obtained  and  delivered  as 
feedback.  Physician  participation  often  leads  to  a  consensus  process.  Even  when  it  does 
not,  however,  participation  may  contribute  to  better  performance.  Members  of  the 
medical  community  are  more  likely  to  feel  that  the  process  is  equitable  if  they  know  they 
have  been  heard. 

Another  process  step  which  is  believed  to  enhance  the  impact  of  profile  information  is 
getting  physicians  to  interact  as  a  group  in  the  interpretation  of  the  profiles.  Interaction 
can  be  considered  participation  at  a  step  following  development  of  profiles.  Similarly,  using 
opinion  leaders  or  one-on-one  detailers  to  deliver  feedback  information  may  enhance  the 
effectiveness  of  feedback.  It  can  be  considered  a  form  of  physician  interaction  at  the 
recipient  phase  of  the  feedback  process.  All  of  these  types  of  interactions  could  be 
conceptualized  as  forms  of  associating  feedback  with  other  interventions.  In  general,  as 
noted  above,  when  feedback  is  associated  with  one  or  more  management  interventions 
such  as  prompts  or  reminders,  it  may  be  more  effective. 

The  time  interval  from  the  occurrence  of  the  events  which  are  profiled  to  the  receipt  of 
feedback  is  a  process  issue  which,  in  the  literature  accumulated  to  date  (see  Appendix), 
does  not  seem  to  be  a  major  factor  in  the  success  of  feedback.  It  is  possible  that  a  more 
important  issue  is  the  timing  of  feedback  of  profile  information  in  relation  to  a  clinical 
decision  which  could  be  altered  by  the  profile.  The  extremes  of  this  timing  have  not  been 
explored,  particularly  the  notion  of  concurrent  feedback  of  profile  information. 

Concurrent  feedback,  which  might  be  considered  the  ultimate  in  someone's  looking  over 
a  physician's  shoulder  in  the  middle  of  the  process  of  clinical  practice,  can  be  effective. 
For  example,  in  a  study  of  the  effectiveness  of  computerized  feedback,  the  cost  of  tests  in 
an  ambulatory  clinic  setting,  which  is  non-profile  information,  was  given  concurrently  with 
the  process  of  computerized  test  ordering  (Tierney  et  al.  1990).  The  intervention  was 
successful  so  long  as  the  feedback  of  test  costs  was  given.  There  also  was  a  very  rapid 
extinction  of  the  effect  when  the  feedback  was  withdrawn.  Similarly,  information  on 
appropriate  days  of  care  when  fed  back  concurrently  to  attending  physicians  led  to 


83 


decreases  in  overall  lengths  of  stay,  presumably  by  altering  the  physicians'  performance  on 
the  day  of  the  feedback  and  subsequent  days  (i.e.,  in  the  midst  of  the  clinical  decision 
making  process  (Restuccia  1982)).  Again,  the  information  did  not  come  from  profiles  but 
rather  from  predetermined  standards  of  appropriate  days. 

In  order  to  apply  the  concept  of  concurrent  feedback  to  profile  information  one  would 
need  to  have  relevant  profiles  available,  most  likely  in  a  computer  system.  For  instance, 
as  primary  care  physicians  were  performing  order-entry  for  a  test  (e.g.,  sedimentation  rate 
or  thyroid  stimulating  hormone)  they  might  be  notified  that  their  personal  ordering  of  this 
test  in  the  past  three  months  was  x  tests/100  encounters  vs  a  clinic  average  of  y  tests/100 
encounters.  Whether  such  information  would  have  an  effect  or  have  a  better  effect  than 
simple  educational  messages  about  the  tests  or  cost  information  is,  of  course,  unknown. 

Theoretical  Underpinnings  of  Feedback 

The  use  of  feedback  is  based  upon  underlying  rational  models  of  behavior  and  behavior 
change  (Eisenberg  1986;  Restuccia  1982;  Rosser  1983;  Kanouse  et  al.  1988;  Goldman  1990; 
Raisch  1990;  and  Epstein  1991).  Several  notions  are  included  in  these  models:  One  is 
that  when  physicians  see  new  information  they  will  change  their  previous  beliefs  which 
ultimately  will  lead  to  a  change  in  their  actions.  A  second  is  that  physicians  cannot 
accurately  assess  their  own  practices  without  specific  information,  an  idea  which  Rosser 
(1983)  has  called  the  "perception-reality"  gap.  Another  is  that  physicians  are  highly 
motivated  to  change  or  extremely  competitive.  This  model  assumes  that  when  physicians 
are  given  some  information  about  their  own  performance  vs  that  of  others  or  vs  a  norm, 
they  will  promptly  change  their  practices  in  order  to  meet  or  beat  the  competition. 

Yet,  physicians'  motivation  to  change  cannot  be  taken  as  a  given  (Kanouse  et  al.  1988); 
and  it  does  not  appear  to  be  straightforward.  For  example,  it  is  not  correct  to  assume  that 
clinicians'  motivation  always  derives  from  the  acquisition  of  new  knowledge.  In  the  study 
on  the  effects  of  concurrent  cost  information  on  computerized  test-ordering  (Tierney  et 
al.  1990)  there  was  no  evidence  that  the  group  which  modified  its  behavior  acquired  new 
knowledge.  Goldman  (1990)  has  suggested  that  in  this  study  the  change  may  be  due  to  the 
simple  fact  that  the  physicians  receiving  the  cost  information  knew  they  were  being 
observed  which  is  equivalent  to  saying  that  the  intervention  produced  a  Hawthorne  effect. 

In  another  study  which  attempted  to  understand  the  motivations  of  physicians  to  perform, 
the  influenza  vaccination  knowledge,  attitudes  and  practices  of  a  group  of  staff  model 
HMO  physicians  were  examined  (Sprauer  et  al.  1989).  These  physicians  were  participants 
in  an  influenza  immunization  program  which  provided  computerized  prompts,  individual 
and  peer-comparison  feedback  (Barton  and  Schoenbaum  1990).  Interestingly,  high- 
performing  physicians  reported  that  performance  tracking  encouraged  influenza 
vaccination,  whereas  low-performing  physicians  reported  that  it  had  little  effect  on  their 
practices.  There  was  also  a  group  of  physicians  whose  performance  improved  between  two 


84 


years  of  observation.  These  physicians  also  were  more  likely  to  report  an  effect  of 
performance  tracking  than  the  low-performing  group. 

It  is  believed  that  when  people  make  inferences  they  usually  use  certain  "rules  of  thumb" 
or  heuristics,  in  particular,  ones  which  have  been  called  "representativeness",  "availability", 
"framing"  and  "anchoring"  (Tversky  and  Kahneman  1974;  Kahneman  and  Tversky  1984). 
Raisch  (1990)  has  related  the  effectiveness  of  various  interventions  for  influencing 
-  physician  behavior  to  these  heuristics  and  gives  several  examples:  For  instance,  in  a 
randomized  controlled  trial  of  interventions  designed  to  increase  the  rates  of  VBAC  and 
decrease  the  rates  of  repeat  cesarean  section  among  groups  of  obstetricians,  opinion 
leaders  who  delivered  an  educational  intervention  were  effective;  whereas,  feedback  of 
group  performance  was  not  effective  (Lomas  et  al.  1991).  Raisch  would  explain  the 
ineffectiveness  of  Lomas'  use  of  group  feedback  on  individual  performance  as  a  problem 
with  representativeness  of  the  information.  Individual  physicians  might  have  perceived 
only  an  indirect  relationship  of  the  feedback  information  to  their  specific  work  with 
individual  patients. 

Raisch  attributes  "the  need  for  frequent  feedback.. .to  the  availability  heuristic  because 
repetition  makes  information  more  accessible".  He  might  use  this  as  an  explanation  for 
the  effectiveness  of  the  feedback  of  cost  information  in  Tierney's  study.  In  addition, 
Raisch  attributes  the  advantage  of  using  opinion  leaders  or  having  one-on-one  contact  to 
"vividness",  which,  in  turn,  is  related  to  the  availability  of  the  information  vs  that  derived 
from  more  impersonal  sources. 

We  mentioned  previously  that  feedback  has  been  more  successful  when  it  has  been 
employed  in  studies  to  improve  quality  of  care  than  in  studies  to  decrease  cost  or 
utilization.  Raisch  suggests  that  this  fact  should  lead  those  interested  in  improving 
physician  performance  to  frame  the  issue  as  one  which  relates  to  quality  of  care  not  just 
cost.  This  type  of  framing  has  been  an  important  component  of  the  industrial  model  of 
quality  improvement. 

Finally,  Raisch  observes  that  anchoring  is  an  issue  when  one  attempts  to  alter  performance 
which  is  the  result  of  a  prior  clear  intention  on  the  part  of  the  physician.  Most  physicians 
have  developed  certain  practices  on  the  basis  of  their  "clinical  experience",  even  though 
they  have  no  scientific  information  to  support  them.  They  will  be  resistant  to  changing 
these  practices.  In  contrast,  most  physicians  also  recognize  that  they  have  no  basis, 
including  clinical  experience,  for  some  of  their  practices.  The  theory  suggests  that  one  is 
more  likely  to  achieve  success  with  feedback  when  it  provides  new  information  to  a 
physician  whose  previous  performance  was  not  grounded  in  a  notion  of  what  is  "right"  to 
do  in  a  certain  circumstance. 


85 


These  heuristics  may  seem  like  "after-the-fact"  reasoning,  but  they  are  useful  for 
structuring  or  framing  our  ideas  about  the  effectiveness  of  feedback  and  for  designing 
future  research  to  improve  our  understanding  of  effectiveness. 

APPROACHES  TO  THE  MANAGEMENT  OF  OUTLIERS  BY  PROFILING 

Profile  information  is  frequently  used  to  identify  outliers.  Peer  Review  Organization 
(PRO)  programs  regularly  use  this  technique  (James  Cannon,  Utah  PRO,  Kim  Downs  and 
Kathy  Michael,  Iowa  Foundation  for  Medical  Care,  and  Greg  Simmons,  Wisconsin  PRO, 
personal  communications).  For  example,  under  current  PRO  regulations,  nurses  review 
the  hospital  records  of  approximately  25  percent  of  all  Medicare  discharges  and  abstract 
information  from  each  record  which  leads  to  a  quality  point  score.  These  scores  are 
aggregated  by  physicians  and  tabulated  quarterly.  Physicians  who  have  high  scores 
indicating  potential  quality  problems  are  handled  by  a  variety  of  mechanisms  depending 
on  the  height  of  the  score.  The  first  step  is  usually  to  provide  the  information  to  the 
physician  who  then  has  an  opportunity  to  respond  to  it  before  there  is  a  final  classification 
of  the  point  score.  Ultimately,  the  point  score  determines  the  PRO's  intervention,  which 
ranges  from  education  of  the  physician  or  intensified  review  of  the  physician's  Medicare 
patients'  hospital  records  to  sanctions  such  as  removal  from  the  Medicare  reimbursement 
program.  PROs  also  use  profiling  to  pick  up  statistical  outliers  ~  individual  physicians  and 
hospitals  whose  performance  on  a  profile  such  as  mortality  is  more  than  two  standard 
deviations  above  the  mean.  Most  often  this  leads  to  intensified  review  of  the  performance 
of  the  outlier  by  the  PRO  with  further  interventions  dependent  upon  the  results  of  the 
review. 

A  similar  use  of  profiling  information,  in  which  the  profile  is  created  by  peer  review  of 
physician  adherence  to  guidelines  developed  by  specialty  societies,  is  being  considered  in 
a  controversial  plan  for  a  physician  recredentialing  program  in  New  York  (Gellhorn  1991). 
This  plan  proposes  a  mandatory  educational  program,  not  loss  of  license,  to  correct 
deficiencies. 

There  are  also  ongoing  attempts  to  use  profiles  of  physician  utilization  practices  as  part 
of  a  process  for  determining  which  physicians  are  eligible  to  become  credentialed  in  pre- 
paid networks  or  to  receive  hospital  privileges.  Outliers  would  be  excluded.  These 
practices  have  been  referred  to  as  "economic  credentialing". 

The  above  are  examples  of  the  use  of  profile  information  to  affect  medical  practice 
primarily  by  regulatory  or  "stick"  approaches  to  outliers  or  "bad  apples".  The  credentialing 
uses  do  not  involve  feedback  of  profiles;  whereas,  feedback  is  a  part  of  utilization  review. 
Restuccia  (1982)  has  distinguished  the  conventional  utilization  review  approach  as  taken 
by  the  PROs  from  a  "cybernetic"  control  system  approach.  In  the  conventional  approach, 
when  a  discrepancy  is  found  between  the  actual  and  desired  state  it  is  presented  to  a 


86 


physician  advisor,  or  adjudicator,  who  requests  more  information  from  the  physician  and 
effectively  assumes  control  of  the  problem.  In  the  cybernetic  approach  the  information 
is  presented  directly  to  the  responsible  physician  who  then  can  then  choose  to  act  upon 
the  information.  There  are  some  interesting  and  striking  examples  to  indicate  that  direct 
feedback  can  affect  outlier  performance. 

For  several  years  Blue  Shield  of  Pennsylvania  has  created  profiles  of  physician  utilization 
performance  (Robert  Edmiston,  Pennsylvania  Blue  Shield,  and  J.  Sanford  Schwartz, 
*  Leonard  Davis  Institute,  University  of  Pennsylvania,  personal  communications).  Using 
computerized  systems,  Pennsylvania  Blue  Shield  calculates  the  number  of  procedures  of 
a  specific  type  billed  by  a  physician  within  a  specialty  group  per  100  subscribers  seen  by 
that  physician  for  whom  any  bill  was  submitted  within  a  specified  time  frame.  For 
example,  it  would  be  possible  to  calculate  the  number  of  colonoscopies  billed  by  a 
gastroenterologist  or  by  a  general  surgeon  for  each  100  Blue  Shield  subscribers  seen  by 
that  physician  in  the  past  year.  This  provides  the  basic  information  from  which  it  is 
possible  to  examine  the  distribution  of  procedure  rates  for  physicians  within  a  specialty. 
Annually,  Pennsylvania  Blue  Shield  sends  a  letter  to  all  physicians  whose  performance  is 
more  than  two  standard  deviations  from  the  mean  explaining  the  monitoring  system  and 
indicating  the  individual  physician's  performance  in  relation  to  the  group  mean.  An 
unpublished  analysis  of  data  from  this  program  indicates  that  the  annual  feedback  alone 
has  had  an  effect  on  physician  behavior.  Physicians  who  receive  the  feedback  letter  often 
change  their  utilization  performance  and  appear  to  do  so  in  a  temporal  relationship  to 
their  receipt  of  the  letter,  suggesting  that  the  effect  is  not  merely  regression  to  the  mean. 
Interestingly,  Pennsylvania  Blue  Shield  also  follows  up  physicians  who  have  been  outliers 
for  three  years  and  has  a  process  of  intensified  review  and  sanctions  as  indicated.  Thus, 
the  overall  Pennsylvania  Blue  Shield  program  addresses  management  of  outliers  both  by 
direct  feedback  and  a  more  conventional  "stick"  or  policing  approach. 

Similar  evidence  for  the  effect  of  feedback  on  outlier  performance  comes  from  a  study  by 
Buck  and  White  (1974).  They  examined  the  effect  of  peer  review  of  ambulatory  claims 
from  a  group  of  physicians  caring  for  Medi-Cal  patients  in  a  forerunner  of  an  independent 
practice  association,  the  San  Joaquin  Foundation  for  Medical  Care.  The  Foundation 
received  capitation  payments  from  the  state,  but  paid  physicians  fee-for-service.  Claims 
were  reviewed  and  adjustments  were  made  as  the  Foundation  saw  appropriate.  The 
authors  identified  on  a  procedure-by-procedure  basis  periods  in  which  there  was  increased 
adjustment  activity.  Under  the  assumption  that  increased  adjustment  activity  might  lead 
to  a  decrease  in  the  claims  rate  for  a  specific  procedure,  they  calculated  regression  lines 
for  the  time  periods  leading  up  to  and  following  the  increased  adjustment  activity.  For  13 
of  the  15  procedures  examined  they  found  a  statistically  significant  and  direct  relationship 
between  the  percentage  of  claims  adjusted  and  a  decline  in  the  number  of  services  claimed. 

This  is  not  a  profiling  study  in  that  profiles  were  not  created  in  advance  and  used  as  the 
basis  of  feedback.  It  is,  however,  a  study  in  which  one  could  consider  each  rejected  claim 


87 


to  be  feedback  to  a  physician.  Accordingly,  physicians  with  the  most  extreme  utilization 
received  the  most  frequent  feedback;  and  this  appears  to  have  had  an  important  effect,  as 
Raisch  (1990)  would  predict  from  the  availability  heuristic. 

Perhaps  the  best  known  examples  of  the  use  of  direct  feedback  to  manage  outliers  come 
from  the  work  of  Wennberg  and  his  colleagues.  In  a  landmark  study,  Wennberg  et  al. 
(1977)  first  calculated  tonsillectomy  rates  for  small  areas  in  Vermont  in  1969  and 
discovered  that  the  highest  area  had  a  rate  over  three  times  the  state  average  and  about 
four  times  as  high  as  the  lowest  area.  In  1971  the  information  was  fed  back  to  the 
physicians  via  the  state  medical  society;  and  by  1973  the  rate  in  the  previously  highest  area 
had  fallen  to  about  the  level  of  the  lowest  area.  It  has  remained  at  that  level  through 
1988. 

In  Maine,  the  Maine  Medical  Assessment  Foundation,  has  adopted  the  techniques  of 
small-area  analysis  as  a  basis  for  feedback  and  peer  involvement  (Wennberg  and 
Gittelsohn  1982;  Caper  1991).  This  program  has  resulted  in  several  interesting 
demonstrations  of  the  effects  of  feedback  for  managing  outlier  areas.  In  one  area  of 
central  Maine  the  rates  of  pediatric  medical  discharges  were  more  than  twice  the  expected 
number  in  1980.  In  1981  the  chief  of  pediatrics  in  the  area's  largest  hospital  began  to 
provide  feedback  on  the  data  and  the  rates  began  to  fall.  In  1984,  however,  he  retired  and 
the  rates  began  to  rise  again.  The  Maine  Medical  Assessment  Foundation's  Pediatric 
Study  Group,  recognizing  the  increase  in  rates,  began  to  provide  additional  feedback  to 
the  local  physicians  in  1985;  and  the  rates  again  declined  -  this  time  to  their  lowest  level. 

A  similar  assessment  by  the  Orthopaedic  Study  Group  in  Maine  discovered  that  there  was 
an  increase  in  the  number  of  lumbar  disc  excisions  in  1983  and  1984  vs  1980-1982  (Keller 
et  al.  1990).  This  proved  to  be  attributable  to  one  small  area  which  had  a  lower  than 
average  number  of  excisions  in  the  earlier  period  followed  by  a  dramatic  rise.  Analysis 
ruled  out  significant  change  in  the  population,  alteration  in  the  type  of  work  being  done 
in  the  areas,  and  data,  error.  It  did  find  that  three  new  surgeons  had  begun  practicing  in 
the  area  and  that  the  increase  in  overall  rate  appeared  to  be  attributable  to  their  practices. 
Information  on  the  variation  in  lumbar  disc  excision  rates  was  presented  to  a  state-wide 
group  of  orthopedists  and  neurosurgeons  in  a  feedback  meeting  in  early  1985.  The  rate 
in  the  previously  high  area  promptly  fell  to  about  the  state  average.  As  the  authors  point 
out,  "The  study-group  did  not  think  that  the  increased  rate  of  operative  treatment 
indicated  wrongdoing  on  the  part  of  surgeons  from  Area  Three.  Rather,  the  variation  was 
considered  to  indicate  uncertainty  about  the  best  way  to  treat  a  herniated  disc.  Until  the 
feedback  meeting  was  held,  the  physicians  had  no  information  about  how  their  patterns 
of  practice  compared  with  those  of  their  peers."  In  Raisch's  framework,  the  physicians  of 
Area  Three  were  not  anchored  to  a  particular  pattern  of  practice  so  that  the  feedback  was 
more  likely  to  be  effective. 


88 


There  have  been  other  situations  in  which  profiles  to  identify  outlier  performance  have 
been  linked  to  interventions  other  than  feedback  which,  in  turn,  have  had  an  impact  on 
practice.  For  example,  utilization  and  quality-of-care  profiles  have  been  employed  by  some 
pre-paid  healthcare  organizations,  such  as  US  HealthCare,  in  determining  performance 
bonuses  (Stocker  1989).  Though  the  process  applies  to  all  physicians  in  the  organization, 
and  thus  there  are  no  controls,  there  appears  to  have  been  overall  improvements  in  some 
areas  of  performance  on  a  year-to-year  basis  (Michael  Stocker  and  Steven  Zatz,  personal 
communication).  One  important  feature  of  the  implementation  of  US  Healthcare's 
program  is  that  it  has  been  perceived  as  rewarding  excellent  performance  not  just 
punishing  poor  performance. 

Two  groups  of  investigators  have  used  profiling  to  identify  physicians  with  high  prescribing 
patterns  for  certain  drugs  and  have  then  targeted  educational  interventions  rather  than 
feedback  at  these  physicians  (Schaffner  et  al.  1983;  Ray  et  al.  1985a,  1985b,  1986,  1987; 
Avorn  and  Soumerai  1983;  Somerai  and  Avorn  1987,  1990).  Schaffner's  group  has  had 
particular  success  with  trained  physician  counselors  who  visit  the  targeted  practitioners. 
Avorn  and  Soumerai  have  had  similar  success  with  doctoral-level  clinical  pharmacists  and 
have  called  the  process  "academic  detailing". 

LINKING  FEEDBACK  OF  PRACTICE  PROFILES  TO  PROCESS/QUALITY 
IMPROVEMENT 

Theoretically,  profiles  can  be  obtained  for  any  measurable  aspect  of  practice.  Some 
aspects  of  practice  are  purely  cognitive;  e.g.,  the  decision  of  when  to  perform  lumbar  disc 
excision.  Other  aspects  of  practice,  even  when  seemingly  simple,  are  intimately  tied  to 
complex  processes  of  care;  e.g.,  effective  immunization  of  adults  (Barton  and  Schoenbaum 
1990)  or  follow-up  of  positive  Pap  smears  (Schoenbaum  and  Gottlieb  1990).  Performance 
of  these  practices  is  far  from  perfect  (Schoenbaum  1990a).  Although  the  simple  feedback 
of  information  on  overall  performance  of  these  practices  usually  leads  to  improvement,  it, 
alone,  does  not  achieve  the  desired  standard. 

Feedback  alone  can  be  considered  a  form  of  exhortation.  It  provides  information  which 
can,  and  often  does,  lead  to  change  by  serving  as  a  carrot  or  stick.  As  we  observed  in  the 
discussion  above  concerning  the  effectiveness  of  process  vs  outcome  information,  the  data 
alone  usually  do  not  elaborate  the  entire  process  of  care  which  underlies  a  practice  or  lead 
uniformly  to  the  design  and  implementation  of  process  change  and  improvement.  In 
short,  while  obtaining  practice  profile  information  and  feeding  it  back  are  essential  first 
steps  in  overall  process  improvement,  they  are  just  first  steps.  To  achieve  exceptional 
performance  these  first  steps  must  be  linked  explicitly  to  interventions  which  will  manage 
or  control  the  process  of  care. 


89 


The  "Convener"  Function  as  a  Step  in  Understanding  Processes-of-Care 

As  an  example  of  a  next  step,  it  should  be  noted  that  when  the  Maine  Medical  Assessment 
Foundation  attempted  to  understand  and  change  a  practice  it  became  a  "convener".  It 
brought  together  groups  of  physicians  to  understand  the  data  and  explore  reasons  for 
variation.  Interestingly,  in  the  study  on  lumbar  disc  excisions,  not  only  did  the  outlier  area 
change  its  practice,  but  there  also  was  a  decrease  in  the  overall  variation  in  performance 
around  the  state  (Keller  et  al.  1990).  Since  the  process  under  study  was  considered 
cognitive,  simply  getting  physicians  together  to  discuss  their  indications  for  doing  an 
operative  procedure  seems  to  have  been  effective.  It  is  likely  that  for  the  most  important 
aspects  of  medical  practice  it  will  be  necessary  to  go  further  into  understanding  the  process 
of  care  than  holding  a  simple  discussion.  Yet,  you  will  also  recall  that  HCFA  failed  to 
perform  the  "convener"  function  when  it  first  published  hospital  mortality  data;  and,  it  is 
possible  that  more  serious  work  would  have  been  done  sooner  to  address  the  data  had  the 
function  been  exercised. 

At  Harvard  Community  Health  Plan  (HCHP)  we  discovered  a  few  years  ago  that  there 
were  substantial  (one  and  one-half  fold)  differences  in  cesarean  section  rates  for  our 
patients  who  were  receiving  care  in  two  large  Boston  hospitals  (Schoenbaum  1990b). 
Convening  our  obstetricians  led  to  further  analysis  of  the  data  and  the  elimination  of 
several  potential  explanations  for  the  variation.  It  also  elucidated  several  differences  in 
processes-of-care  between  the  hospitals.  Some  of  the  processes  of  the  hospital  with  the 
lower  rate  were  introduced  into  the  other  hospital.  These  included  changes  in  approaches 
to  obstetrical  anesthesia  and  to  the  performance  of  VBACs.  The  cesarean  section  rate  fell 
in  the  higher  hospital,  and  for  the  past  few  years  the  rates  at  the  two  hospitals  have  been 
essentially  identical.  Currently,  in  the  hope  of  markedly  reducing  cesarean  section  rates, 
one  of  these  hospitals  is  engaged  in  a  large  controlled  trial  of  a  radically  redesigned  set  of 
processes  for  prenatal  care  and  for  the  management  of  labor. 

The  Idea  of  Quality  Management  in  Health  Care 

In  the  face  of  competition  from  Japanese  industry,  the  idea  of  managing  quality  has 
become  almost  an  obsession  in  American  industry.  By  now  most  are  familiar  that  the 
Japanese  began  their  development  of  high  quality  production  processes  and  continuous 
quality  improvement  with  the  early  post-war  assistance  of  Americans,  including  W. 
Edwards  Deming  and  Joseph  Juran.  The  principles  and  practices  of  quality  management 
can  certainly  be  applied  to  health  care;  and,  as  in  other  industries,  they  involve  a 
combination  of  skills,  methods  and  attitudes  (Deming  1986;  Juran  1989;  Berwick  1989; 
James  1989).  These  combine  aspects  of  management  which  in  the  past  have  come  under 
the  rubrics  of  the  academic  disciplines  of  organizational  behavior  and  operations 
management/engineering.  For  those  whose  interest  may  be  more  on  profiling  than  on  the 
mechanics  of  operations  management,  it  is  worth  noting  that  Deming  began  his  career  as 
a  statistician. 


90 


Quality  management  is  data-driven.  One  begins  with  a  problem  which,  ideally,  has  been 
identified  or  elucidated  with  data  (see  the  discussion  above  of  distributions,  variation,  and 
statistical  control).  One  next  tries  to  understand  the  production  process  which  has  led  to 
the  result.  Examples  such  as  the  approaches  to  cesarean  section  rates  at  HCHP  and 
lumbar  disc  surgery  by  the  Maine  Medical  Assessment  Foundation  can  be  considered 
quality  management  of  what  Deming  calls  "special  cause"  variation  or  what  we  have  been 
referring  to  as  outliers.  If,  however,  one  focusses  only  on  the  high  outliers  in  a  normal 
.  distribution  one  has  neglected  97.5  percent  of  the  performers.  There  are  many  examples 
to  indicate  that  the  performance  of  non-outliers  can  be  improved  dramatically;  i.e.,  the 
mean  performance  can  be  shifted  and  the  distribution  of  performance  narrowed.  The 
result  is  an  increase  in  quality,  decrease  in  waste,  and,  often,  a  decrease  in  cost  (U.S.  GAO 
1991). 

There  is  a  defined  set  of  quality  management  techniques  which  can  be  employed  to  shift 
or  enhance  performance  (Juran  1989).  A  sine  qua  non  is  obtaining  as  detailed  as  possible 
an  understanding  of  the  production  process.  Sometimes  there  is  no  clear-cut  process,  or 
the  process  is  considered  so  flawed  that  it  is  desirable  to  replace  it.  Addressing  this 
problem  leads  one  to  employ  the  techniques  which  Juran  has  called  "Quality  Planning". 
An  example  of  this  was  just  given  above  when  we  noted  that  one  Boston  hospital  is 
involved  in  a  trial  of  radically  redesigned  processes  of  obstetrical  care. 

Often  there  is  a  defined  and  generally  reasonable  process,  but  the  outcome  is  less  than 
desired.  In  such  instances  one  can  employ  the  techniques  which  Juran  has  called  "Quality 
Improvement".  In  any  case,  once  a  process  has  been  designed  or  redesigned,  implemented 
and  shown  to  be  effective,  one  needs  to  use  so-called  "Quality  Control"  techniques  to 
ensure  that  the  improvement  will  be  sustained.  Quality  control  involves  measurement  to 
assure  that  performance  is  in  the  range  in  which  it  was  designed  to  fall.  Thus 
measurement,  or  profiling,  plays  two  important  roles  in  quality  management.  It  is  essential 
for  defining  the  areas  of  performance  worth  improving,  and  it  is  essential  for  monitoring 
subsequent  performance. 

There  are  still  very  few  published  examples  of  applications  of  quality  management 
techniques  to  clinical  care.  Nevertheless,  since  none  of  the  developers  or  popularizers  of 
quality  management  methodology  has  a  monopoly  on  the  techniques,  it  is  apparent  that 
over  the  years  health  care  institutions  have  done  efficacious  quality  planning  or 
improvement  projects  without  stating  that  they  were  doing  so.  One  such  example  is 
described  by  Myers  and  Gleicher  (1988)  who  instituted  a  program  to  manage  obstetrical 
labor  and  delivery  at  Mount  Sinai  Hospital  in  Chicago  which  had  the  effect  of  lowering 
cesarean  section  rates  from  17.5  percent  in  1985  to  11.5  percent  in  1987  and  forceps 
deliveries  and  midpelvic  procedures  from  10.4  to  4.3  percent.  The  program  consisted  of 
the  design  and  implementation  of  several  process  steps  which  applied  to  each  physician 
who  participated  (participation  was  mandatory  for  house  staff  and  voluntary  for  the 
hospital's  private  staff).  The  prescribed  processes  for  participating  physicians  included 


91 


documented  second  opinions  for  all  cesarean  sections  except  true  emergencies,  a  policy 
preferring  vaginal  birth  for  all  patients  who  had  previous  cesarean  sections  (VBAC)  and 
communication  of  that  policy  to  patients  at  the  first  prenatal  visit,  criteria  for  when 
cesarean  section  for  dystocia  and  fetal  distress  were  acceptable.  There  was  an  optional 
policy  for  vaginal  delivery  of  most  breech  presentations.  Importantly,  there  was  a  quality 
control  process  which  consisted  of  peer  review  of  the  process  steps  listed  above.  Cesarean 
section  performance  and  quality  control  data  were  fed  back  in  monthly  staff  conferences. 

Given  the  maternal  morbidity  and  costs  associated  with  cesarean  section  and  the  finding 
that  the  program  at  Mount  Sinai  Hospital  successfully  lowered  cesarean  section  rates 
without  increasing  fetal  or  neonatal  mortality  rates  (they  were  unchanged),  it  can  be 
concluded  that  the  program  both  increased  quality  of  care  and  decreased  utilization  and 
cost. 

The  Physician  and  Quality  Management 

If  one  thinks  about  it,  it  is  remarkable  that  Myers  and  Gleicher  were  able  to  carry  out 
their  program.  The  facts  are  that  both  the  hospital  and  its  private  practicing  physicians 
potentially  could  lose  money  from  its  successful  implementation.  Also,  it  would  not  be 
surprising  if  the  physicians  were  concerned  about  their  risk  of  being  sued  for  malpractice 
in  the  event  of  a  poor  obstetrical  outcome  in  a  woman  who  did  not  undergo  cesarean 
section.  Finally,  the  physicians,  knowing  they  and  their  practices  would  be  monitored  more 
closely  than  in  the  past,  could  be  concerned  about  their  loss  of  professional  autonomy. 
These  concerns  or  conflicting  values  will  undoubtedly  torpedo  even  some  well-planned 
quality  management  efforts  in  other  institutions.  Nevertheless,  the  Mount  Sinai  experience 
is  a  landmark  for  demonstrating  what  can  be  done  to  enlist  physicians  and  health  care 
institutions  in  utilization/cost  management. 

It  should  be  obvious  from  the  nature  of  the  examples  and  discussion  above  that  although 
physicians  might  consider  surveillance  of  their  practice  to  be  a  loss  of  autonomy,  there  are 
highly  responsible  uses  of  profiling  information  which  require  the  active  participation  of 
knowledgeable  physicians  to  take  full  advantage  of  opportunities  for  improving  care.  The 
physicians  who  participated  in  the  Mount  Sinai  program  may  have  lost  autonomy  in  the 
sense  that  they  no  longer  could  do  whatever  they  pleased  and  be  unobserved.  This  false 
autonomy  was  replaced  by  their  understanding  that  a)  they  would  be  practicing  within  a 
controlled  framework,  into  which  they,  as  a  group,  had  input;  b)  the  object  was  to  decrease 
operative  deliveries  while  at  the  same  time  preserving  or  enhancing  neonatal  and  maternal 
outcomes  -  a  clearcut  improvement  in  quality;  c)  the  data  on  their  performance  would  be 
available  to  them;  i.e.,  the  feedback  would  be  a  "carrot";  and  d)  in  the  event  of  a  poor 
outcome,  if  they  followed  the  defined  processes  and  procedures  subscribed  to  by  the  group, 
their  risk  of  losing  a  malpractice  suit  should  actually  be  less.  In  short,  even  if  "autonomy" 
is  lost,  it  should  be  a  highly  responsible  and  gratifying  professional  act  to  participate  in 


92 


clinical  quality  management  activities  and  ensure  that  the  results  will,  indeed,  lead  to  a 
betterment  of  practice. 

THE  FUTURE 

Two  current  trends  indicate  that  there  will  be  more  profiling  of  clinical  practices  in  the 
future:  One  is  the  demand  from  the  public  for  more  accountability  in  health  care.  The 
other  is  an  increasing  recognition  within  the  health  care  industry  that  significant  quality 
problems  exist. 

The  Demand  for  Accountability 

Payors,  including  the  government  and  employers,  and  the  general  public  are  the  customers 
of  the  health  care  system.  They  are  concerned  both  about  cost  and  quality  of  care.  This 
is  reflected  in  their  growing  demands  that  the  health  care  system  provide  measurements 
of  its  performance  and  that  it  develop  and  implement  practice  guidelines.  In  turn, 
guidelines  are  becoming  less  general  and  are  beginning  to  specify  the  critical  steps  or 
processes  of  health  care.  This  type  of  guideline  provides  opportunities  for,  indeed 
necessitates,  targeted  profiling  of  performance  of  the  critical  processes. 

Accountability  often  has  negative  connotations.  If  profiling  is  used  only  or  primarily  as 
part  of  a  process  to  punish  "bad"  or  even  "less  good"  apples,  it  is  likely  to  have  less  impact 
on  practice  than  it  could  have,  for  several  reasons:  Outliers,  by  definition,  are  a  small 
percentage  of  the  population,  and  one  could  be  focussing  one's  efforts  on  a  larger  area  of 
practice  (see  above,  "The  Idea  of  Quality  Management  in  Health  Care").  Secondly, 
regulatory  or  punitive  uses  of  information  may  have  a  chilling  effect  on  positive  uses  of 
information.  Thirdly,  the  net  effect  may  be  marginal.  With  the  exception  of  those  who 
are  ignorant  or  dishonest,  outlier  behavior  is  not  consistently  associated  with  any  single  set 
of  individuals.  Very  few  individuals  are  bad  or  exceptional  at  everything.  Accordingly, 
when  one  simply  ties  bad  or  exceptional  performance  to  punishment  or  reward  and  when 
there  are  many  different  performance  variables,  one  gets  an  averaging  of  the  punishments 
and  rewards  for  most  individuals.  Thus,  ultimately,  no  one  is  particularly  rewarded  or 
punished,  and  one  has  gone  through  a  lot  of  effort  at  scoring  and  distributing  rewards  or 
punishments.  Fourthly,  the  "reward"  may  not  be  desirable:  Imagine  a  group  of  general 
hospitals  profiled  on  the  basis  of  their  performance  of  hip  surgery  and  cardiac  surgery  by 
a  consortium  of  payors  who  would  like  to  direct  more  business  to  the  "best"  institutions. 
A  couple  of  hospitals  do  extremely  well  at  cardiac  surgery  and  a  couple  at  hip  surgery;  but 
they  are  different  hospitals.  The  consortium  now  may  try  to  direct  all  of  its  cardiac 
surgery  and  hip  surgery  cases  to  the  hospitals  which  have  done  the  best.  On  the  other 
hand,  these  general  hospitals  may  have  no  desire  to  absorb  all  of  the  cardiac  or  hip  surgery 
that  could  come  their  way.  In  effect,  they  would  be  transformed  into  heart  hospitals  or 
orthopedic  hospitals  which  may  not  be  consonant  with  what  they  see  as  their  "mission". 

93 


By  perceiving  the  result  of  their  good  performance  as  a  problem  rather  than  as  a  reward, 
they  may  lose  any  motivation  to  improve  that  area  of  performance  further.  Indeed,  seeing 
that  their  limited  services  are  in  greater  demand,  they  may  respond  simply  by  increasing 
their  prices. 

Recognition  of  the  Existence  of  Quality  Problems  in  Health  Care 

For  many  years,  indeed  a  few  decades,  we  denied  the  existence  of  quality  problems  in 
health  care.  We  defined  quality  as  access  to  new  technologies  and  largely  ignored  the  our 
variation  in  use  of  existing  technologies,  decision-making  errors,  skill  deficits,  and 
inadequate  integration  of  care  over  episodes  of  illness.  In  this  context,  our  increasing 
recognition  of  the  existence  of  quality  problems  in  health  care  can  be  considered  positive. 
Identification  of  the  quality  issues  and  subsequent  management  of  identified  issues 
provides  opportunities  for  designing  profiles,  feeding  back  the  results  and  institution  of 
quality  management  approaches  to  improve  performance  (see  above). 

Though  everyone  is  talking  about  "quality"  in  health  care,  there  are  some  who  see 
measurement  and  quality  management  as  entrepreneurial  opportunities,  a  chance  to 
develop  a  measurement  industry  of  "black  boxes"  or  a  consulting  industry;  others  who  see 
it  as  a  chance  to  develop  highly  quantitative  assessment  or  scoring  systems  primarily  so 
that  purchasers  can  direct  their  business  to  the  low-cost-adequate-quality  or  possibly  to  the 
high-quality  producer;  and  yet  others  who  see  all  of  this  as  a  "public  good"  with  the  hope 
that  everyone  will  find  enough  value  in  it  that  there  will  be  a  continuous  commitment  to 
accountability  and  improvement  of  the  problems  which  inevitably  occur  in  complex 
systems.  The  movement  to  introduce  quality  management  into  the  health  care  industry 
will  be  most  likely  to  succeed  if  its  proponents  are  able  to  demonstrate  that  within  the 
health  care  system,  as  within  some  other  industries  here  and  abroad,  the  principles  of 
quality  management  do  truly  lead  to  better  "products"  and  ultimately  to  more  satisfied 
customers,  both  patients  and  payors. 

We  believe  that  there  are  several  reasons  to  favor  the  model  of  developing  quality 
assessment  and  management  as  a  public  good.  Unfortunately,  the  infrastructure  for 
profiling  and  translation  of  profiles  into  positive  forces  for  medical  practice  and  health 
care  system  improvement  is  just  beginning  to  be  put  in  place;  and  like  just  about 
everything  else  in  our  pluralistic  society,  there  is  no  unified  scheme  for  accomplishing  the 
work.  There  are  several  components  which  will  need  to  be  thought  out  and  implemented. 
These  potentially  include  fundamental  changes  in  our  conception  of  medical  practice 
which,  in  turn,  would  necessitate  fundamental  changes  in  our  approach  to  medical 
education. 


94 


Bare  Bones  Outline  of  a  Schema  for  Change  to  a  Profiling/Quality  Management  Model 

It  is  beyond  the  scope  of  this  paper  to  develop  an  entire  schema  for  change.  Nevertheless, 
the  basic  notion  can  be  conceptualized  as  follows:  The  current  medical  model  is 
constructed  on  a  foundation  of  biotechnical  knowledge  which,  in  turn,  derives  from  the 
building  blocks  of  basic  science  research.  Physicians  are  prepared  for  medical  practice  by 
teaching  them  the  elements  of  the  basic  sciences  and  training  them  in  settings  where  they 
can  acquire  technical  skills  and  experience  in  using  medical  technology.  Although  medical 
students  are  taught  some  epidemiologic  principles  and  how  to  examine  data  critically,  then- 
own  practices  and  those  of  their  academic  role  models  are  not  subjected  to  systematic 
evaluation.  Redesign  of  the  educational  and  practice  models  would  require  the 
development  and  implementation  of  data  systems  to  track  clinician  and  system 
performance. 

The  current  model  of  medical  practice  is  also  based  upon  the  concept  of  physician 
autonomy.  Physicians  are  expected  to  be  able  to  provide  care  or  develop  a  local  care 
system,  pretty  much  on  their  own.  Medical  education  is  designed  to  provide  the  student 
with  enough  tools  so  that  she/he  can  function  in  such  a  world.  Physicians  undoubtedly  are 
and  are  likely  to  remain  the  most  influential  professional  group  involved  in  the  delivery 
of  medical  care;  but  physicians  need  to  become  increasingly  aware  that  they  are 
functioning  in  a  complex  set  of  systems  along  with  many  other  people.  Physicians  will  need 
to  learn  how  managed  systems  and  organizations  function  and  how  they  can  be  improved. 
They  will  need  a  much  better  understanding  of  the  costs  of  providing  care  so  that  they  can 
participate  intelligently  in  the  design  of  more  cost-efficient  systems  of  care.  In  order  to 
engender  greater  satisfaction,  they  will  also  need  to  learn  more  about  behavioral  aspects 
of  care.  None  of  this  is  currently  a  serious  part  of  medical  education. 

Despite  the  fact  that  "medical  service"  income  has  become  the  single  largest  funding  source 
for  US  medical  schools  (John  et  al.  1991)  and  now  accounts  for  over  40  percent  of  all 
revenue,  an  amount  which  equals  the  contributions  from  federal  research  and  state  and 
local  governments,  the  schools  appear  to  be  the  part  of  the  health  care  system  most 
isolated  from  direct  customer  needs.  Physicians'  expectations  and  role  models  derive  first 
from  medical  school  experiences.  It  is  critical,  as  noted  above,  that  we  develop  the  next 
generation  of  physicians  in  the  quality  management  model.  It  is  regrettable  that  medical 
schools  could  be  the  last,  whereas  they  should  be  among  the  first,  to  change. 

Cautionary  Notes 

Although  our  overall  assessment  is  that  the  potential  exists  for  profiles  to  have  a  strongly 
positive  impact  on  medical  practice  in  the  future,  there  are  also  a  couple  of  cautionary 
considerations.  One  of  these  relates  to  the  limitations  of  profiles.  Although  these  have 
been  discussed  in  other  papers  for  this  conference,  the  limitation  that  we  would  like  to 
emphasize  is  that  it  is  virtually  impossible  to  profile  all  aspects  of  medical  practice.  Some 


95 


medical  conditions  or  processes  occur  frequently  enough  so  that  they  can  be  profiled  in 
a  meaningful  unit  of  aggregation;  others  do  not.  It  is  not  an  accident  that  several  of  the 
examples  in  this  paper  are  based  on  obstetrical  delivery  practices.  It  is  an  area  of  medical 
care  with  relatively  large  numbers.  There  are  and  will  remain  many  aspects  of  medical 
practice  for  which,  despite  our  best  efforts,  it  will  not  be  possible  to  obtain  enough 
meaningful  information. 

A  second  important  problem  is  that  collecting  profile  information,  assessing  it,  and  then 
devising  programs  which  have  an  impact  on  practice  cost  time  and  money.  Ultimately, 
economic  or  investment  decisions  must  be  made  to  determine  which  aspects  of  practice 
should  be  subjected  to  this  process  and  which  should  not. 

SUMMARY  AND  RECOMMENDATIONS 
Summary 

Our  goals  in  this  paper  have  been  to  enhance  the  reader's  understanding  of  the  basic 
ingredients  involved  in  any  consideration  of  improving  physician  and  health  care  system 
performance  and  to  stimulate  discussion  of  the  most  productive  ways  to  accomplish 
improvement.  Profiling  is  an  essential  ingredient,  and  we  have  indicated  that  it  can  have 
an  effect  on  practice  performance  in  two  ways:  one  is  a  regulatory  or  "stick"  approach;  the 
other  is  a  quality  management  or  "carrot"  approach.  Regulatory  approaches  apply 
primarily  to  performance  outliers;  whereas,  quality  management  approaches  can  be  applied 
to  the  entire  distribution  of  performance. 

Feedback  of  profile  information  is  an  extremely  important  intervention  which  itself  can 
stimulate  practice  modification  or  can  be  associated  with  other  management  interventions. 
We  have  reviewed  much  of  the  available  literature  on  the  feedback  of  practice  profile 
information  (see  Appendix  4A)  and  have  categorized  several  content  and  process  issues 
which  appear  to  be  related  to  the  success  of  the  feedback  process. 

We  have  introduced  the  subject  of  quality  management  so  that  the  reader  can  see  at  least 
an  outline  of  the  process  by  which  information  from  profiles  can  be  linked  to  a  purposeful 
set  of  interventions  and  yield  increasingly  better  performance. 

We  acknowledge  that  an  implication  of  the  increasing  surveillance  of  medical  practice  and 
health  care  system  performance  is  a  decrease  in  physician  autonomy  in  the  old, 
unrestrained  sense  of  the  term.  In  fact,  we  believe  that  the  increasingly  complexities  of 
the  health  care  system  and  practice  of  medicine  necessitate  redefinition  of  the  concept  of 
autonomy.  We  believe  that  it  can  be  supplanted  with  the  concept  that  physicians  have  the 
right  and  responsibility  to  improve  their  practices  and  to  work  with  others  to  improve  the 


96 


health  care  system.  We  firmly  believe  that  physicians,  when  properly  trained  to  play  then- 
important  role  in  improving  health  practices,  will  find  it  extremely  gratifying. 

Our  society  is  pluralistic.  No  one  institution,  organization,  or  agency  can  unilaterally 
change  the  health  care  system.  We  believe  that  the  current  "crisis"  in  health  care  in  the 
United  States  is  a  crisis  both  in  quality  and  cost.  We  believe  that  a  resolution  to  this  crisis 
will  involve  redesigning  medicine  and  medical  care.  In  effect,  we  have  argued  for  the  need 
to  institute  change  at  all  levels  of  the  health  care  system  starting  in  medical  school. 

In  our  view,  "practice  profiles"  are  a  code  term  for  the  critical  measurements  which  identify 
problems  and  drive  a  system  of  improvement  and  also  for  the  critical  measurements  which 
assess  whether  improvement  has  occurred.  We  have  demonstrated  with  many  examples 
that  profiles  can  have  an  impact.  The  crucial  issue  is  whether  we  will  be  complacent  with 
any  impact  or  whether  we  are  willing  to  engage  in  the  hard  work  of  maximizing  impact. 

Recommendations 

The  stage  has  been  set  in  this  paper  for  several  recommendations  to  improve  profiling  and 
maximize  its  impact.  We  have  already  noted  that  the  principles  and  practices  of  quality 
management  involve  a  combination  of  skills,  methods,  and  attitudes.  Accordingly,  the 
spirit  of  our  recommendations  is  that  we  must  be  open  to  developing  new  skills,  learning 
new  methods,  and  adopting  different  attitudes. 

A  first  recommendation  is  that  those  who  are  involved  in  profiling  activities  redesign  the 
process  for  obtaining  profiles.  We  need  to  abandon  the  notion  of  profiling  data  simply 
because  the  data  are  there.  Instead,  we  must  focus  on  what  data  we  think  we  need  and 
why;  i.e.,  we  need  to  state  hypotheses  about  care  processes  and  outcomes.  Then  we  must 
design  systems  to  enable  us  to  obtain  the  information  we  need.  It  is  likely  that  this  will 
entail  the  development  and  enhancement  of  automated  clinical  information  systems 
(Schoenbaum  and  Barnett,  in  press,  Institute  of  Medicine  1991). 

Second,  we  must  acquire  an  understanding  of  the  methods  of  quality  management  and 
apply  them  to  health  care.  At  the  moment  it  is  hard  to  find  a  health  care  organization  in 
this  country  which  does  not  say  it  is  involved  in  continuous  quality  improvement  or  total 
quality  management.  Relatively  few,  however,  have  actually  changed  to  a  customer- 
centered,  data-driven  and  process-oriented  approach  to  the  design  and  execution  of  all 
aspects  of  care.  We  are  not  yet  doing  the  simple  things  such  as  examining  performance 
for  statistical  stability  or  understanding  customer  needs. 

Third,  as  the  health  care  industry  and  physicians  in  it  become  increasingly  interested  in 
defining  and  improving  the  processes  of  care,  it  is  imperative  for  nonphysicians  involved 
in  health  care  to  learn  more  about  medical  processes  so  that  they  understand  the 
limitations  of  clinical  information  and  medical  science.  It  is  disturbing  to  discover  that 


97 


there  are  large  numbers  of  persons  involved  in  the  health  care  industry  who  know  very 
little  about  medicine  and  medical  care  and  believe  that  simplistic  solutions  such  as  finding 
and  punishing  "bad  apples"  will  solve  current  system  problems. 

Fourth,  there  is  a  need,  as  in  most  aspects  of  health  and  medical  care,  for  more  research. 
We  have  indicated  that  profiling  is  most  likely  to  be  effective  if  it  is  scientifically  grounded. 
This  argues  strongly  for  increased  attention  to  medical  effectiveness  or  outcomes  research. 
Another  research  need  is  to  develop  better  information  on  how  to  maximize  the 
effectiveness  of  feedback  and  other  interventions  to  improve  cognitive  areas  of  clinical 
practice  (e.g.,  reminders  and  prompts).  For  the  foreseeable  future,  medicine  is  likely  to 
remain  a  field  involving  human  practitioners.  There  needs  to  be  additional  research  on 
maximizing  the  effectiveness  of  human  input  into  development  of  profiles,  feedback, 
guidelines,  etc.  For  example,  one  could  study  the  role  of  participation  or  involvement. 

Last,  but  not  least,  changes  leading  to  a  more  accountable,  data-driven,  managerially- 
oriented  health  care  system  would  ideally  have  begun  with  changes  in  medical  education. 
That  has  not  happened;  but  it  is  not  too  late  for  productive  changes  to  be  developed  and 
instituted  in  medical  school  and  post-graduate  training. 

ACKNOWLEDGMENTS 

We  would  like  to  acknowledge  the  following  persons  who  were  willing  to  share  their 
thoughts  and  prior  experience:  Jon  Alsip,  Howard  Bailit,  Charles  Buck,  Jr.,  James 
Cannon,  Katherine  Coltin,  Kim  Downs,  Jonathan  Edelson,  Robert  Edmiston,  Lawrence 
Gottlieb,  Kathryn  Langwell,  Kathy  Michael,  Daniel  Murrey,  Heather  Palmer,  Sarah 
Pedersen,  Susanne  Salem-Schatz,  William  Schaffner,  J.  Sanford  Schwartz,  Greg  Simmons, 
Michael  Stocker,  Jonathan  Weiner,  and  Steven  Zatz.  They  were  generous  of  their  time 
and  are  in  no  way  responsible  for  errors  in  our  interpretation  of  their  work. 

REFERENCES 

Avorn,  J.  and  S.B.  Soumerai,  "Improving  Drug-Therapy  Decisions  Through  Educational 
Outreach:  A  Randomized  Controlled  Trial  of  Academically  Based  'Detailing'," 
The  New  England  Journal  of  Medicine  308:1457-1463,  1983. 

Barton,  M.B.  and  S.C.  Schoenbaum,  "Improving  Influenza  Vaccination  Performance  in  an 
HMO  Setting:  The  Use  of  Computer-Generated  Reminders  and  Peer 
Comparison  Feedback,"  American  Journal  of  Public  Health  80:534-536,  1990. 


98 


Berwick,  D.M.,  "Continuous  Improvement  as  an  Ideal  in  Health  Care,"  The  New  England 
Journal  of  Medicine  320:53-56,  1989. 

Berwick,  D.M.,  "The  Double  Edge  of  Knowledge,"  Journal  of  the  American  Medical 
Association  266:841-842,  1991. 

Buck,  Jr.,  C.R.  and  K.L.  White,  "Peer  Review:  Impact  of  a  System  Based  on  Billing 
Claims,"  The  New  England  Journal  of  Medicine  291:877-883,  1974. 

Camp,  R.C.,  Benchmarking:  The  Search  for  Industry  Best  Practices  that  Lead  to  Superior 
Performance  (Milwaukee:  ASQC  Press,  1989). 

Caper,  P.,  "Population-Based  Measures  of  the  Quality  of  Medical  Care,"  in  J.B.  Couch,  ed., 
Health  Care  Quality  Management  for  the  21st  Century  (Tampa:  American  College 
of  Physician  Executives,  1991),  pp.  281-327. 

Deming,  W.E.,  "Chapter  11:  Common  Causes  and  Special  Causes  of  Improvement:  Stable 
System"  in  W.E.  Deming,  Out  of  the  Crisis  (Cambridge:  Massachusetts  Institute 
of  Technology,  Center  for  Advanced  Engineering  Study,  1986),  pp.  309-370. 

Eisenberg,  J.M.,  Doctors '  Decisions  and  the  Cost  of  Medical  Care:  The  Reasons  for  Doctors' 
Practice  Patterns  and  Ways  to  Change  Them  (Ann  Arbor:  Health  Administration 
Press,  1986). 

Epstein,  A.M.,  "Changing  Physician  Behavior:  Increasing  Challenges  for  the  1990s," 
Archives  of  Internal  Medicine  151:2147-2149,  1991. 

Fineberg,  H.V.,  A.R.  Funkhouser,  and  H.  Marks,  "Variation  in  Medical  Practice:  A 
Review  of  the  Literature,"  in  R.  Egdahl  and  D.C.  Walsh,  eds.,  Health  Cost 
Management  and  Medical  Practice  Patterns,  Volume  2  (Boston:  Ballinger 
Publishing  Co.,  1985),  pp.  143-168. 

Freiman,  J.A.,  T.C.  Chalmers,  H.  Smith,  et  al.,  "The  Importance  of  Beta,  the  Type  II  Error 
and  Sample  Size  in  the  Design  and  Interpretation  of  the  Randomized  Control 
Trial:  Survey  of  71  'Negative'  Trials,"  The  New  England  Journal  of  Medicine 
299:690-694,  1978. 

Gellhorn,  A.,  "Periodic  Physician  Recredentialling,"  Journal  of  the  American  Medical 
Association  265:752-755,  1991. 

Goldman,  L.,  "Changing  Physicians'  Behavior:  The  Pot  and  the  Kettle,"  The  New  England 
Journal  of  Medicine  322:1524-1525,  1990. 


99 


Institute  of  Medicine  (U.S.)  Committee  on  Improving  the  Patient  Record,  R.S.  Dick  and 
E.B.  Steen,  eds.,  The  Computer-Based  Patient  Record:  An  Essential  Technology  for 
Health  Care  (Washington,  DC:  National  Academy  Press,  1991). 

James,  B.C.,  Quality  Management  for  Health  Care  Delivery  (Chicago:  The  Hospital 
Research  and  Educational  Trust,  1989). 

Jolin,  L.D.,  P.  Jolly,  J.Y.  Krakower,  et  al.,  "U.S.  Medical  School  Finances,"  Journal  of  the 
American  Medical  Association  266:985-990,  1991. 

Juran,  J.M.,Juran  on  Leadership  for  Quality:  An  Executive  Handbook  (New  York:  The  Free 
Press,  a  Division  of  Macmillan,  Inc.,  1989). 

Kahneman,  D.  and  A.  Tversky,  "Choices,  Values,  and  Frames,"  American  Psychologist 
39:341-50,  1984. 

Kanouse,  D.E.  and  I.  Jacoby,  "When  Does  Information  Change  Practitioners'  Behavior?" 
International  Journal  of  Technology  Assessment  in  Health  Care  4:27-33,  1988. 

Keller,  R.B.,  D.N.  Soule,  J.E.  Wennberg,  et  al.,  "Dealing  with  Geographic  Variations  in  the 
Use  of  Hospitals:  The  Experience  of  the  Maine  Medical  Assessment  Foundation 
Orthopaedic  Study  Group,"  Journal  of  Bone  and  Joint  Surgery  72-A:  1286-1293, 
1990. 

Lomas,  J.,  M.  Enkin,  G.M.  Anderson,  et  al.,  "Opinion  Leaders  vs  Audit  and  Feedback  to 
Implement  Practice  Guidelines:  Delivery  after  Previous  Cesarean  Section," 
Journal  of  the  American  Medical  Association  265:2202-2207,  1991. 

Myers,  S.A.  and  N.  Gleicher,  "A  Successful  Program  to  Lower  Cesarean  Section  Rates," 
The  New  England  Journal  of  Medicine  319:1511-1516,  1988. 

O'Connor,  G.T.,  S.K.  Plume,  E.M.  Olmstead,  et  al.,  "A  Regional  Prospective  Study  of  In- 
Hospital  Mortality  Associated  with  Coronary  Artery  Bypass  Grafting,"  Journal  of 
the  American  Medical  Association  266:803-809,  1991. 

Raisch,  D.W.,  "A  Model  of  Methods  for  Influencing  Prescribing:  Part  II.  A  Review  of 
Educational  Methods,  Theories  of  Human  Inference,  and  Delineation  of  the 
Model,"  DICP  Annals  of  Pharmacotherapeutics  24:537-542,  1990. 

Ray,  W.A.,  W.  Schaffner,  and  C.F.  Federspiel,  "Persistence  of  Improvement  in  Antibiotic 
Prescribing  in  Office  Practice,"  Journal  of  the  American  Medical  Association 
253:1774-1776,  1985A. 


100 


Ray,  W.A.,  R.  Fink,  W.  Schaffner,  et  al.  "Improving  Antibiotic  Prescribing  in  Outpatient 
Practice:  Nonassociation  of  Outcome  with  Prescriber  Characteristics  and 
Measures  of  Receptivity,"  Medical  Care  23:1307-1313,  1985B. 

Ray,  W.A.,  D.G.  Blazer  II,  W.  Schaffner,  et  al.,  "Reducing  Long-Term  Diazepam 
Prescribing  in  Office  Practice:  A  Controlled  Trial  of  Educational  Visits,"  Journal 
of  the  American  Medical  Association  256:2536-2539,  1986. 

Ray,  W.A.,  D.G.  Blazer  II,  W.  Schaffner,  et  al.,  "Reducing  Antipsychotic  Drug  Prescribing 
for  Nursing  Home  Patients:  A  Controlled  Trial  of  the  Effect  of  an  Educational 
Visit,"  American  Journal  of  Public  Health  77:1448-1450,  1987. 

Restuccia,  J.D.,  "The  Effect  of  Concurrent  Feedback  in  Reducing  Inappropriate  Hospital 
Utilization,"  Medical  Care  20:46-62,  1982. 

Rosser,  W.W.,'  "Using  the  Perception- Reality  Gap  to  Alter  Prescribing  Patterns,"  Journal 
of  Medical  Education  58:728-732,  1983. 

Schaffner,  W.,  W.A.  Ray,  C.F.  Federspiel,  et  al.,  "Improving  Antibiotic  Prescribing  in 
Office  Practice:  A  Controlled  Trial  of  Three  Educational  Methods,"  Journal  of 
the  American  Medical  Association  250:1728-1732,  1983. 

Schoenbaum,  S.C.,  "When  is  the  Quality  of  Care  Good  Enough?"  American  Journal  of 
Public  Health  80:403-404,  1990a. 

Schoenbaum,  S.C.,  "An  Attempt  to  Manage  Variation  in  Obstetrical  Practice,"  in  KA. 

Heitkoff  and  K.N.  Lohr,  eds.,  Effectiveness  and  Outcomes  in  Health  Care 
(Washington,  DC:  National  Academy  Press,  1990b),  pp.  190-200. 

Schoenbaum,  S.C.  and  L.K.  Gottlieb,  "Algorithm  Based  Improvement  of  Clinical  Quality," 
British  Medical  Journal  301:1374-1376,  1990. 

Schoenbaum,  S.C.  and  G.O.  Barnett,  "Automated  Ambulatory  Medical  Records  Systems: 
An  Orphan  Technology,"  International  Journal  of  Technology  Assessment  in  Health 
Care,  in  press. 

Soumerai,  S.B.  and  J.  Avorn,  "Predictors  of  Physician  Prescribing  Change  in  an 
Educational  Experiment  to  Improve  Medication  Use,"  Medical  Care  25:210-221, 
1987. 


101 


Soumerai,  S.B.  and  J.  Avorn,  "Principles  of  Educational  Outreach  (Academic  Detailing) 
to  Improve  Clinical  Decision  Making,"  Journal  of  the  American  Medical 
Association  263:549-556,  1990. 

Sprauer,  MA.,  W.W.  Williams,  J.W.  White,  et  al.,  "Influenza  Vaccination:  Knowledge, 
Attitudes,  and  Practices  Among  Physicians  in  an  HMO,"  unpublished  manuscript 
presented  at,  and  abstracted  in  Program  and  Abstracts  of  the  Twenty-Ninth 
Interscience  Conference  on  Antimicrobial  Agents  and  Chemotherapy  (Houston: 
ICAAC,  September,  1989),  p.  195. 

Stocker,  M.,  "Quality  Assurance  in  an  IPA,"  HMO  Practice  3:183-187,  1989. 

Tierney,  W.M.,  M.E.  Miller,  and  C.J.  McDonald,  "The  Effect  on  Test  Ordering  of 
Informing  Physicians  of  the  Charges  for  Outpatient  Diagnostic  Tests,"  The  New 
England  Journal  of  Medicine  322:1499-1504,  1990. 

Tversky,  A.  and  D.  Kahneman,  "Judgment  Under  Uncertainty:  Heuristics  and  Biases," 
Science  185:1124-31,  1974. 

U.S.  General  Accounting  Office,  Management  Practices:  U.S.  Companies  Improve 
Performance  Through  Quality  Efforts,  GAO/NSAID-9 1-190,  (Washington,  DC: 
GAO,  1991). 

Wennberg,  J.E.,  L.  Blowers,  R.  Parker,  et  al.,  "Changes  in  Tonsillectomy  Rates  Associated 
with  Feedback  and  Review,"  Pediatrics  59:821-826,  1977. 

Wennberg,  J.  and  A.  Gittelsohn,  "Variations  in  Medical  Care  Among  Small  Areas," 
Scientific  American  246:120-129,  1982. 

Williams,  S.V.,  D.B.  Nash,  and  N.  Goldfarb,  "Differences  in  Mortality  from  Coronary 
Artery  Bypass  Graft  Surgery  at  Five  Teaching  Hospitals,"  Journal  of  the  American 
Medical  Association  266:810-815,  1991. 


102 


APPENDIX  4A 


The  following  bibliography  lists  papers  examining  the  feedback  of  profiles  to  physicians. 
We  have  defined  feedback  of  profiles  as  the  presentation  to  a  physician  or  group  of 
physicians  of  a  statistical  summary  or  overview  depicting  practice  patterns.  For  inclusion 
in  this  bibliography,  the  feedback  must  be  based  on  data  derived  from  a  group  of  cases  (a 
profile)  as  opposed  to  an  ad  hoc,  case  by  case,  medical  record  audit.  The  summary  may 
consist  of  process  or  outcome  information  on  individuals  or  a  group  of  clinicians.  It  must 
also  include  information  taken  from  the  practice  of  the  physician  to  whom  the  data  are 
presented  (see  section  on  Feedback  formats,  in  the  main  body  of  the  paper). 

This  list  was  compiled  through  literature  searches,  a  bibliography  obtained  from  the 
Agency  for  Health  Care  Policy  and  Research,  references  in  the  papers  themselves,  and  our 
own  files.  This  bibliography  is  not,  however,  an  exhaustive  review  of  the  profiling 
literature.  The  most  useful  Medline  search  terms  included  the  following:  Feedback  OR 
Stats  and  Numerical  Data  OR  Utilization  AND  Physician  Practice  Patterns  OR  Standards. 
At  the  end  of  this  appendix,  there  is  a  bibliographical  summary. 

Profiling  Literature 

Applegate  WB,  Bennett  MD,  Chilton  L,  et  al.,  "Impact  of  a  Cost-Containment  Educational 
Program  on  Housestaff  Ambulatory  Clinic  Charges,"  Medical  Care  21(5):486-496,  1983. 


Objective: 

Setting: 

Interventions: 


Results: 


Decrease  ordering  of  diagnostic  laboratory  tests 
Teaching  hospital 

Education  and  audit  discussions  combined  with  feedback  (quarterly 
summaries  of  charges  generated  by  resident  compared  to  clinic  mean; 
sample  bills).  Intent  of  the  feedback  was  to  create  an  atmosphere  for 
questioning  cost  (but  costs  were  not  linked  to  specific  diseases). 

Intervention  associated  with  significant  reduction  in  mean  laboratory 
charges  ($6.30)  and  in  total  encounter  charges  ($10.36). 


Barton  MB,  and  Schoenbaum  SC,  "Improving  Influenza  Vaccination  Performance  in  an 
HMO  Setting:  The  Use  of  Computer-Generated  Reminders  and  Peer  Comparison 
Feedback,"  American  Journal  of  Public  Health  80:534-536,  1990. 


Objective: 


Improve  influenza  immunization  performance 


Setting: 


HMO 


Interventions:     1)  postcards  to  patients,  educational  materials,  chart  reminders  to 
physicians,  performance  feedback  to  chiefs 
2)  all  of  above  interventions  plus  individual  feedback  to  physicians  (lists 
of  unimmunized  patients),  plus  peer  comparison  feedback 

Results:  Intervention  #1  increased  immunization  rates  (in  patients  >65)  to  42%, 

but  only  intervention  #2  resulted  in  a  statistically  significant  change  (to 
60%).  Effective  ongoing  intervention  (several  years). 

Berwick  DM,  and  Coltin  KL,  "Feedback  Reduces  Test  use  in  a  Health  Maintenance 
Organization,"  Journal  of  the  American  Medical  Association  255(11):  1450,  1986. 

Objective:      .  Decrease  ordering  of  blood  tests  and  X-rays 

Setting:  HMO 

Interventions:     1)  departmental  educational  meetings 

2)  monthly  peer  comparison  feedback  on  cost  of  test  use 

3)  monthly  peer  comparison  feedback  on  yield  of  tests 

Results:  14.2%  decrease  in  test  ordering  and  8.3%  decrease  in  variation  of  test 

ordering  among  physicians  with  peer  feedback  comparison  of  cost. 
Intervention  #3,  feedback  about  yield  (rates  of  abnormal  results)  failed 
to  produce  a  change. 


Braham  RL,  and  Ruchlin  HS,  "Physician  Practice  Profiles:  A  Case  Study  of  the  Use  of 
Audit  and  Feedback  in  an  Ambulatory  Care  Group  Practice,"  Health  Care  Management 
Review  12(3):  11-6,  1987. 

Objective:  Decrease  ancillary  service  charges 

Setting:  Ambulatory  group  practice 

Intervention:  Semiannual  letter  from  the  medical  director  suggesting  improvement 
areas  plus  feedback  listing  individual's  average  charge  for  visit  (initial  and 
revisits)  compared  to  peers  in  their  group  and  compared  to  the  entire 
practice  average.  Comparable  data  provided  for  selected  diagnoses. 


104 


Results: 


Constant  dollar  charges  for  ancillary  services  declined  by  23%. 


Buffington  J,  Bell  KM,  LaForce  FM,  et  al.,  "A  Target-Based  Model  for  Increasing 
Influenza  Immunization  in  Private  Practice,"  Journal  of  General  Internal  Medicine  6:204-209, 
1991. 


Objective: 


Increase  flu  immunization  rates  in  patients  over  65  years  old 


Setting:  Private  practice 

Interventions:     1)  concurrent  feedback  (providers  tracked  immunization  rates  on 
posters) 

2)  concurrent  feedback  plus  postcards  to  patients  needing  immunizations 


Results: 


Immunization  rates  were  as  follows:  control  50%,  feedback  plus  postcard 
67%,  feedback  alone  66%.  Postcards  did  not  appreciably  increase 
immunization  rates  (data  from  public  clinics  indicated  that  postcards 
simply  moved  immunizations  from  public  to  private  clinics). 


Carey  TS,  Levis  D,  Pickard  CG,  et  al.,  "Development  of  a  Model  Quality-of-Care 
Assessment  Program  for  Adult  Preventive  Care  in  Rural  Medical  Practices,"  Quality  Review 
Bulletin  17(2):54-9,  1991. 


Objective:  Improve  screening  and  prevention 
Setting:  37  rural  clinics 

Interventions:     Peer  comparison  feedback 


Results:  Feedback  improved  recorded  performance  in  8/9  parameters  capable  of 

improvement  (i.e.,  pap,  clinical  breast  exam,  breast  exam  instruction, 
mammogram,  rectal  exam,  stool  occult  blood  test,  assessment  of  cigarette 
use,  assessment  of  alcohol  use).  Flu  immunization  performance  did  not 
improve,  and  blood  pressure  performance  was  100%  both  years. 


Cayten  CG,  Tanner  LA,  Riedel  DC,  et  al.,  "Surgical  Audit  Using  Predetermined  Weighted 
Criteria,"  Connecticut  Medicine  38(3):  117-122,  1974. 


105 


Objective: 
Setting: 


Improve  surgical  care  for  six  specific  diagnoses 
Teaching  hospital 


Intervention:  Individual,  peer  comparison  and  aggregate  data  on  work-up  and 
complication  criteria  (developed  and  weighted  for  importance  by  peers) 
for  all  six  diagnoses  was  sent  to  physicians  once 

Results:  No  overall  improvement  in  work-up  or  complication  scores. 


Chassin  MR,  and  McCue  SM,  "A  Randomized  Trial  of  Medical  Quality  Assurance: 
Improving  Physicians'  Use  of  Pelvimetry,"  Journal  of  the  American  Medical  Association 
244:565-70,  1986. 


Objective:         Decrease  inappropriate  ordering  of  X-ray  pelvimetry 

Setting:        '     Study  conducted  by  six  PSROs  in  120  hospitals  in  six  states 

Intervention:  Obstetricians  participated  in  development  and  presentation  of 
educational  program,  monthly  aggregate  feedback  distributed  to  chiefs 
and  posted  in  physicians'  lounges. 

Results:  Highly  significant  decrease  in  physicians'  ordering  of  pelvimetry.  Effects 

persisted  up  to  10  months  after  the  conclusion  of  the  feedback  period. 


Chodroff  DH,  "Cancer  Screening  and  Immunization  Quality  Assurance  Using  a  Personal 
Computer,"  Quality  Review  Bulletin  August:279-287,  1990. 


Objective: 

Setting: 

Interventions: 


Results: 


Increase  cancer  screening 

Ambulatory  clinic  in  a  teaching  hospital 

1)  concurrent  reminder  form  noting  which  interventions  are  due  along 
with  most  recent  dates  and  results  of  screening  tests  (computer- 
generated  by  a  personal  computer) 

2)  monthly  peer  comparison  feedback  (computer  assisted) 

Significant  increases  in  cancer  screening  rates:  from  37%  to  62%  (ma- 
mmogram), from  45%  to  68%  (clinical  breast  exam);  from  42%  to  71% 
(Pap);  from  32%  to  64%  (rectal  exam);  from  22%  to  65%  (fobt);  from 
7%  to  50%  (tetanus),  from  49%  to  87%  (pneumococcal  vaccine);  from 


106 


36%  to  85%  (flu  immunizations).  Ongoing  intervention  with  sustained 
positive  results. 


Cohen  DI,  Jones  P,  Littenberg  B,  et  al.,  "Does  Cost  Information  Availability  Reduce 
Physician  Test  Usage?"  Medical  Care  20(3) :286,  1982. 

Objective:         Decrease  ordering  of  diagnostic  laboratory  tests  and  X-rays 
Setting:  Teaching  hospital 

Intervention:      Daily,  individual,  itemized  lists  of  laboratory  and  X-ray  test  costs 

Results:  Intervention  decreased  test  usage,  but  only  when  team  leaders  took  an 

active  interest  in  the  cost-control  program.  Test  use  continued  to  fall 
after  experimental  period  in  groups  where  team  leader  took  an  interest. 

Cruse  PJE,  and  Foord  R,  "A  Five- Year  Prospective  Study  of  23,649  Surgical  Wounds," 
Archives  of  Surgery  107:206-210,  1973. 

Cruse  PJE,  and  Foord  R,  "The  Epidemiology  of  Wound  Infection:  A  10  Year  Prospective 
Study  of  62,939  Wounds,"  Surgical  Clinics  of  North  America  60:27-40,  1980. 

Objectives:        Decrease  the  proportion  of  surgical  clean  wound  infections 


Setting: 
Intervention: 


Results: 


Surgical  ward  of  a  teaching  hospital 

Monthly,  aggregate  infection  rates  posted  in  operating  room;  analysis  of 
infected  wounds  posted  and  discussed  in  Infection  Control  meeting; 
aggregate  infection  rates  discussed  at  monthly  departmental  meetings; 
(semiannual-first  five  years,  annual-second  five  years)  feedback  of 
individual  infection  rates  compared  to  aggregate  of  hospital's  rates. 

Authors  conclude  that  feedback  of  data  and  identification  and  application 
of  several  changes  in  surgical  preparation  and  technique  led  to  steady 
decline  in  clean  wound  infection  rates  from  2.5%  to  .6%  over  a  10  year 
period. 


Devitt  JE,  "Does  Continuing  Medical  Education  by  Peer  Review  Really  Work?"  Canadian 
Medical  Association  Journal  108:1279-1281,  1973. 


107 


Devitt  JE,  and  Ironside  MR,  "Can  Patient  Care  Audit  Change  Doctor  Performance?" 
Journal  of  Medical  Education  50:1122-1123,  1975. 


Objectives: 

Setting: 
Intervention: 


Results: 


a)  Decrease  the  number  of  unnecessary  breast  biopsies 

b)  Decrease  the  length  of  postoperative  stays 

Community  hospital  in  Canada 

Standards  developed  for  proportion  of  biopsies  positive  for  cancer  and 
length  of  postoperative  stay;  data  from  a  four-year  period  presented  at 
grand  rounds  and  compared  to  standards;  peer  comparison  data  was  sent 
to  each  physician. 

Improvement  in  proportion  of  biopsies  positive  and  length  of  stay  had 
begun  pre-intervention  and  continued;  moreover,  there  was  a  significant 
decrease  in  the  number  of  unnecessary  breast  biopsies. 


Everett  GD,  deBlois  S,  Chang  PF,  et  al.,  "Effect  of  Cost  Education,  Cost  Audits,  and 
Faculty  Chart  Review  on  the  Use  of  Laboratory  Services,"  Archives  of  Internal  Medicine 
143:942,  1983. 


Objective:         Decrease  ordering  of  diagnostic  laboratory  tests 
Setting:  Teaching  hospital 

Interventions:     1)  Cost  education:  newsletters 

2)  Cost  audit:   weekly,  computerized  feedback  of  tests  ordered  and 
charges  per  patient-day  by  each  first  year  resident 

3)  Cost  education  and  cost  audit 

4)  Faculty  chart  review 


Results:  There  was  no  change  with  cost  education  alone  (a  non-profiling 

intervention)  and  an  increase  in  laboratory  test  ordering  with  cost  audit 
alone.  Success  in  the  combined  program  of  cost-education  and  cost  audit 
group  was  limited  to  a  significant  decrease  only  in  total  tests  ordered 
(9.4%).  There  was  only  a  1.2%  reduction  in  charges. 

The  face-to-face  chart  audit  program,  Intervention  #4,  which  did  not 
involve  feedback  of  profiles  did  decrease  total  tests  ordered  by  15.1% 
and  charges  by  an  average  of  9.8%  (with  greatest  drops  in  ordering  of  12 
component  test  panels  and  complete  blood  counts). 


108 


Feely  J,  Chan  R,  Cocoman  L,  et  al.,  "Hospital  Formularies:  Need  for  Continuous 
Intervention,"  British  Medical  Journal  300:28-30,  1990. 


Objective:  Increase  ordering  of  generic  drugs  and  decrease  inappropriate  ordering 

of  cephalosporins 

Setting:  General  hospital  (in  Ireland) 

Intervention:  Drug  formulary  combined  with  peer  comparison  feedback  of  prescribing 
patterns  (with  examples  of  specific  savings);  prescribing  patterns 
discussed  at  monthly  meetings. 


Results:  Generic  prescribing  rose  by  50%  with  ordering  of  feedback  and 

compliance  with  cephalosporin  recommendations  was  good  (29% 
reduction  in  annual  use).  Drug  costs  rose  during  the  next  year  when 
feedback  was  discontinued  and  only  the  formulary  was  maintained. 


Frazier  LM,  Brown  JT,  Divine  GW,  et  al.,  "Can  Physician  Education  Lower  the  Cost  of 
Prescription  Drugs?  A  Prospective,  Controlled  Trial,"  Annals  of  Internal  Medicine  115:116- 
121,  1991. 


Objective: 

Setting: 

Interventions: 


Results: 


Decrease  prescribing  expenses 
Medicine  clinic  in  a  teaching  hospital 

1)  Manual  of  drug  prices  with  advice,  weekly  cost-oriented  reminders, 
two  individual  (and  practice  averages  for  comparison)  feedback 
reports  sent  three  months  apart  (lists  drugs  prescribed,  average 
number  of  months-supply  ordered  per  prescription,  note  about 
potential  patient  savings) 

2)  Manual-based  educational  program  on  cholesterol  management 

Feedback  group  (Intervention  #1)  prescribed  less  expensive  drugs  within 
classes  of  drugs.  The  change  in  drug  price  score  was  -.15.  There  was 
also  an  increase  of  .74  months  supply  of  medication  (reduced  dispensing 
fees). 


109 


Gehlbach  SH,  Wilkinson  WE,  Hammond  WE,  et  al.,  "Improving  Drug  Prescribing  in  a 
Primary  Care  Practice,"  Medical  Care  22:193-201,  1984. 


Objective:  Increase  rates  of  ordering  of  generic  drugs  in  a  family  medicine  practice 
Setting:  Family  medicine  residency  practice 

Intervention:  Monthly,  computerized,  individual  feedback  of  brand  names  ordered 
(with  suggestions  for  generic  alternatives  and  potential  cost  savings)  and 
names  and  number  of  generic  drug  ordered 

Results:  Generic  prescribing  increased  from  a  baseline  of  14%  to  67%,  and  there 

was  a  significant  increase  in  generic  prescribing  compared  to  the  control 
group.  Both  the  concurrent  timing  and  a  respected  source  of 
recommendations  appear  to  be  important  in  the  success.  Results 
persisted  during  the  year  after  the  intervention  with  a  mild  decrease  in 
generic  prescribing  (down  to  57%). 


Gortmaker  SL,  Bickford  AF,  Mathewson  HO,  et  al.,  "A  Successful  Experiment  to  Reduce 
Unnecessary  Laboratory  Use  in  a  Community  Hospital,"  Medical  Care  26(6):631,  1988. 

Objective:         Decrease  ordering  of  diagnostic  laboratory  tests 

Setting:  General  hospital 

Intervention:  Nine  staff  meetings  on  costs  of  unnecessary  laboratory  tests  and  peer 
comparison  reports  of  estimated  excess  tests  were  mailed  to  physicians. 
Lecture  and  feedback  repeated  one  year  later. 

Results:  Feedback  decreased  laboratory  test  ordering  by  1.8/patient  and  led  to  a 

significant  decrease  in  laboratory  charges  per  patient  for  the  hospital 
overall.  There  was  no  evidence  of  a  decrease  in  "essential"  tests.  There 
was  a  net  savings  of  5-8  times  the  cost  of  the  program. 


Grivell  AR,  Forgie  HJ,  Fraser  CJ,  et  al.,  "Effect  of  Feedback  to  Clinical  Staff  of 
Information  on  Clinical  Biochemistry  Requesting  Patterns,"  Clinical  Chemistry  27:1717-20, 
1981. 

Objective:         Decrease  number  of  unnecessary  clinical  biochemistry  laboratory  tests 


110 


Setting: 


Inpatients  in  a  tertiary-care  community  hospital  (in  Australia) 


Intervention:  Monthly,  individual  feedback  listing  type  and  number  of  specimens 
ordered,  number  of  patients  for  whom  ordered,  number  of  abnormal 
specimens,  number  of  tests,  number  of  abnormal  results,  potential  costs 
(histograms  of  test  ordering  frequencies  for  entire  group  also 
accompanied  the  individual  feedback) 


Results: 


No  decrease  in  unnecessary  clinical  biochemistry  laboratory  tests. 


Henderson  D,  D'Alessandri  R,  Westfall  B,  et  al,  "Hospital  Cost  Containment:  A  Little 
Knowledge  Helps,"  Clinical  Research  27:279 A,  1979. 


Objective: 


Decrease  hospital  charges  for  inpatients 


Setting: 


Teaching  hospital 


Intervention:      Daily,  individualized,  computerized  feedback  (printout  of  all  charges 
incurred  by  their  patients) 


Results: 


There  was  a  29%  reduction  in  total  charges  per  patient. 


Hershey  CO,  Porter  DK,  Breslau  D,  et  al.,  "Influence  of  Simple  Computerized  Feedback 
on  Prescription  Charges  in  an  Ambulatory  Clinic:  A  Randomized  Clinical  Trial,"  Medical 
Care  24:472-81,  1986. 


Objective:         Decrease  prescription  charges 

Setting:  Ambulatory  clinic  of  a  teaching  hospital 

Intervention:      Monthly,  computer-generated,  individual  and  aggregate  feedback  of 
prescription  charges 

Results:  Feedback  had  small  effect  (only  one  of  three  parameters  improved 

significantly;  there  was  also  a  long  latent  period  before  effects  were 
seen).  However,  because  the  data  was  so  easy  to  retrieve  from  the 
hospital  billing  system,  this  intervention  had  a  high  benefit-to-cost  ratio. 


Ill 


Hirsch  EO,  "Utilization  Review  as  a  Means  of  Continuing  Education,"  Medical  Care 
12(4):358-362,  1974. 


Objective:         For  patients  hospitalized  for  congestive  failure: 

a)  decrease  length  of  stay 

b)  decrease  number  of  days  to  cardiac  compensation 

c)  decrease  the  number  of  chest  X-rays 

d)  decrease  the  number  of  electrocardiograms 

e)  decrease  the  number  of  SMA-12  laboratory  tests 

Setting:  Community  hospital 

Intervention:      Face-to-face  discussion  of  individual  data  from  prior  year  compared  to 
peers;  repeated  one  year  later. 

Results:  •  There  were  significant  decreases  in  average  length  of  stay  (22  to  12 

days),  days  to  improvement  (13  to  3.3),  and  the  number  of  SMA-12 
laboratory  tests  (8.6  to  4.2). 


Keller  RB,  Soule  DN,  Wennberg  JE,  et  al,  "Dealing  with  Geographic  Variations  in  the 
Use  of  Hospitals:  The  Experience  of  the  Maine  Medical  Assessment  Foundation 
Orthopaedic  Study  Group,"  The  Journal  of  Bone  and  Joint  Surgery  72-A(9):  1286-1293,  1990. 

Objective:         Decrease  variation  in  hospital  utilization  for  lumbar  disc  excisions 

Setting:  Hospitals  in  15  of  Maine's  largest  population  areas 

Intervention:      Study  group  presented  analysis  of  data  in  a  face-to-face  meeting  with 
orthopedists  and  neurosurgeons  from  around  the  state 

Results:  Rates  of  excision  decreased  in  Area  #3  and  surrounding  areas  (which 

had  had  the  largest  incidence  of  excision  in  the  years  before  the 
feedback)  to  normal  rates  in  the  three  years  following.  Authors  conclude 
that  major  reasons  for  the  variation  were  related  to  lack  of  agreement 
about  optimum  treatment. 


Kincaid  WH,  "Changing  Physician  Behavior:  The  Peer  Data  Method,"  Quality  Review 
Bulletin  238-242,  1984. 


Objective: 


Decrease  length  of  stay 


112 


Setting: 


Hospitals  in  northwest  Ohio 


Intervention:  Peer  comparison  data  and  letters  to  physicians  with  greater  than  average 
lengths  of  stay.  Also  followed  a  group  of  physicians  with  shorter  lengths 
of  stay  to  determine  if  they  had  changes  in  length  of  stay  without 
intervention  during  the  study  period. 

Results:  Gap  between  high  user  group  and  average  was  reduced  by  42.9%. 


Kroenke  K,  Hanley  JF,  Copley  JB,  et  al.,  "Improving  House  Staff  Ordering  of  Three 
Common  Laboratory  Tests:  Reduction  in  Test  Ordering  Need  Not  Result  in 
Underutilization,"  Medical  Care  25(10):928,  1987. 


Objective: 


Setting: 
Interventions: 


Results: 


1)  Increase  percentage  of  clinically  indicated  urine  cultures,  sputum 
cultures,  and  admission  urinalyses 

2)  Determine  if  underutilization  becomes  a  problem 
Military  leaching  hospital 

1)  Weekly  lectures  (first  10-week  study  period)  on  indications  for  urine 
cultures,  sputum  cultures,  and  admission  urinalyses,  (aggregate?)  data 
on  actual  test  ordering  from  baseline  period,  and  weekly  individual 
feedback  on  percentage  of  clinically  indicated  tests  ordered. 

2)  Required  reason  for  ordering  the  test  on  the  order  sheet  (second  10- 
week  study  period) 

Intervention  #1,  weekly  individual  feedback  reports,  significantly 
increased  the  proportion  of  indicated  tests.  Intervention  #2,  requiring 
a  "reason  for  ordering"  on  the  order  forms  for  urine  and  sputum  cultures 
and  urinalysis  yielded  an  even  greater  increase  in  the  proportion  of 
indicated  tests  ordered.  Underutilization  was  less  during  the  two 
intervention  periods  than  it  was  nine  months  after  the  intervention. 


Laxdal  OE,  Jennett  PA,  Wilson  TW,  et  al.,  "Improving  Physician  Performance  by 
Continuing  Medical  Education,"  Canadian  Medical  Association  Journal  118: 105 1- 1058, 1978. 

Objective:         Decrease  the  frequency  of  12-15  prescribing  problems 

Setting:  Three  rural  hospitals 


113 


Interventions: 


Manually  retrieved,  aggregate  data  sent  to  physicians  within  1-2  weeks, 
lectures  and  discussions 


Results: 


The  average  frequency  of  prescribing  problems  was  reduced  by  63% 
(percentage  of  possible  improvement),  whereas  control  hospitals 
decreased  by  only  32%. 


Lomas  J,  Enkin  M,  Anderson  GM,  et  al.,  "Opinion  Leaders  vs  Audit  and  Feedback  to 
Implement  Practice  Guidelines:  Delivery  After  Previous  Cesarean  Section ,"  Journal  of the 
American  Medical  Association  265(17):2202-2207,  1991. 

Objective:  1)  Decrease  number  of  repeat  cesarean  section 

2)  Increase  number  of  vaginal  births  after  cesarean  section 

Setting:  16  community  hospitals 

Interventions:  1)  Physicians  participated  in  establishing  guidelines,  charts  audited  for 
compliance,  meetings  held  every  three  months  to  discuss  aggregate 
feedback 


Results: 


2)  Designated  opinion  leader  presented  guideline  and  "detailing"  sheets 
to  local  physicians,  hosted  a  local  lecture/discussion  with  an  outside 
expert,  and  maintained  formal  and  informal  contact  with  local 

physicians. 

Aggregate  feedback,  intervention  #1,  failed  to  produce  a  decrease  in  the 
number  of  repeat  cesarean  section  or  an  increase  in  the  number  of 
vaginal  births  after  cesarean  section  compared  to  controls.  In  contrast, 
trials  of  labor  were  46%  higher  and  vaginal  births  were  85%  higher  than 
controls  in  the  opinion  leader  group. 


Lyle  CB,  Bianchi  RF,  Harris  JH,  et  al.,  "Teaching  Cost  Containment  to  House  Officers  at 
Charlotte  Memorial  Hospital,"  Journal  of  Medical  Education  54:856-862,  1979. 

Objective:         Decrease  average  length  of  stay  for  inpatients;  decrease  ordering  of 
unnecessary  laboratory  tests  for  outpatients 

Setting:  Teaching  hospital 


114 


Interventions: 


1)  First  two  years  (inpatient  data):  monthly,  face-to-face  review  of  total 
costs  incurred  for  each  patient;  third  year:  itemized  bills  reviewed  with 
residents  every  three  days 


2)  First  year  (outpatient  data):  monthly,  face-to-face  review  of  activities 
and  costs  generated;  second  year:  daily,  face-to-face  review  of  data 

Results:  During  the  first  three  and  one  half  years  the  average  length  of  stay  was 

reduced  by  21%,  and  even  with  a  25%  increase  in  unit  test  costs  there 
was  a  5%  decrease  in  the  average  cost  for  tests  per  patient. 

Marton  KL,  Tul  V,  Sox  HC,  "Modifying  Test  Ordering  Behavior  in  the  Outpatient  Medical 
Clinic:  A  Controlled  Trial  of  Two  Educational  Interventions,"  Archives  of  Internal  Medicine 
145:816,  1985. 


Objective:  Decrease  ordering  of  diagnostic  laboratory  tests 
Setting:  Medicine  clinics  at  two  teaching  hospitals 

Interventions:     1)  Manual  on  cost-effective  laboratory  ordering 


2)  Peer  comparison  feedback 


3)  Manual  and  peer  comparison  feedback 


Result:  The  manual  plus  peer  comparison  feedback,  intervention  #3,  had  a  much 

more  dramatic  effect  on  test  ordering  (when  controlled  for  diagnosis, 
42%  decrease)  than  either  intervention  alone. 


McPhee  SJ,  Bird  JA,  Jenkins  CNH,  et  al.,  "Promoting  Cancer  Screening:  A  Randomized, 
Controlled  Trial  of  Three  Interventions,"  Archives  of  Internal  Medicine  149: 1866-1872, 1989. 

Objective:         Increase  cancer  screening 

Setting:  Group  practice  (teaching) 

Interventions:     1)  Concurrent  reminders  to  physicians  (list  of  overdue  tests  at  visit) 

2)  Monthly,  individual  feedback  (seminars  with  feedback  about 
performance  rates,  compared  confidentially  to  peers  and  group 
means) 

115 


N.B.:  half  of  patients  in  each  group  (#1  and  #2)  received  education 
in  the  form  of  letters  about  overdue  tests  and  literature 

Results:  Feedback,  intervention  #2,  significantly  increased  performance  of  only 

breast  exam  and  mammography.  Concurrent  reminders,  intervention  #1, 
significantly  increased  performance  of  all  but  Pap  smear  (which  had  high 
performance  preintervention). 

Myers  SA,  and  Gleicher  N,  "A  Successful  Program  to  Lower  Cesarean  Section  Rates,  "The 
New  England  Journal  of  Medicine  319:1511-15116,  1988. 

Objective:  Decrease  the  number  of  cesarean  section  deliveries  from  25%  to  11%  of 
all  deliveries 

Setting:  Inner-city  hospital 

Intervention :  Voluntary  physician  participation,  required  second  opinion,  establishment 
of  objective  criteria,  quarterly  individual  feedback  on  cesarean  section 
rates,  monthly  staff  conferences  presented  department-wide  cesarean 
section  rates  in  comparison  to  guidelines 

Results:  Although  both  primary  and  repeat  cesarean  section  rates  decreased  from 

17.5%  to  11.5%,  only  the  decline  in  primary  cesarean  sections  (from  12% 
to  6.8%)  was  significant.  (Operative  vaginal  deliveries  also  declined 
significantly-from  10.4%  to  4.3%)  There  were  no  adverse  outcomes  (for 
mothers  or  infants)  associated  with  the  decline  in  cesarean  section  rates. 


Nattinger  AB,  Panzer  FJ,  and  Janus  J,  "Improving  the  Utilization  of  Screening 
Mammography  in  Primary  Care  Practices,"  Archives  of  Internal  Medicine  149:2087-2092, 
1989. 

Objective:         Increase  percentage  of  women  (ages  50-74)  with  annual  mammogram 
Setting:  Ambulatory  clinic  of  a  teaching  hospital 

Interventions:     1)  "Visit-based  cue"  a)  for  patient:  educational  handout  b)  for  physician: 
prefilled  radiology  requisition  form 

2)  Monthly,  computerized  feedback  to  residents  (individual  for  first  three 
months,  peer  comparison  for  second  three  months) 


116 


Results: 


Feedback  (62%)  and  "visit-based  cues"  (54%)  yielded  greater  proportion 
of  women  with  an  annual  mammogram  (control,  36%). 


Palmer  RH,  Louis  TA,  Hsu  LN,  et  al.,  "A  Randomized  Controlled  Trial  of  Quality 
Assurance  in  Sixteen  Ambulatory  Care  Practices,"  Medical  Care  23:751-70,  1985. 

Objective:         Improve  quality  of  care  for  each  of  several  tests  or  parameters  of  patient 
care  believed  to  be  important  to  patient  outcome 


Setting: 


Eight  medical  and  eight  pediatric  ambulatory  practices 


Interventions: 


Results: 


Educational  sessions,  opportunity  to  state  disagreement  with  criteria,  and 
face-to-face,  individual  feedback  of  quality  of  task  performance  for  some 
of  each  provider's  patients 

Significantly  improved  two  of  eight  tasks  important  to  patient  outcome. 


Parrino  TA,  "The  Nonvalue  of  Retrospective  Peer  Comparison  Feedback  in  Containing 
Hospital  Costs,"  American  Journal  of  Medicine  86:442,  1989. 

Objective:         Decrease  antibiotic  prescriptions 

Setting:  Tertiary  referral  hospital 

Intervention:      Monthly,  computerized  peer  comparison  feedback  targeted  at  high 
antibiotic  use  attendings. 

Results:  Feedback  did  not  produce  a  significant  change  in  antibiotic  usage  for 

patients  of  targeted  attendings. 


Pop  P,  and  Winkens  RAG,  "A  Diagnostic  Centre  for  General  Practitioners:  Results  of 
Individual  Feedback  on  Diagnostic  Actions,"  Journal  of  the  Royal  College  of  General 
Practitioners  39(32 9):507-8,  1989. 

Objective:         Decrease  ordering  of  diagnostic  tests 

Setting:  General  practices  (in  the  Netherlands) 


117 


Intervention:      Semiannual,  individualized  feedback  of  one  month  of  test  orders 
compared  to  overall  group  of  physicians 


Results: 


Decrease  in  test  ordering  rates  after  feedback  was  initiated  (in  the 
seventh  and  eighth  years  there  was  a  large  decrease  in  test  ordering). 


Pozen  MW,  and  Gloger  H,  "The  Impact  on  House  Officers  of  Educational  and 
Administrative  Interventions  in  an  Outpatient  Department,"  Social  Science  and  Medicine 
10:491-495,  1976. 

Objective:         Measure  effect  of  educational  and  administrative  interventions  on  house 
staff 


Setting: 
Interventions: 


Results: 


Ambulatory  clinic  in  a  teaching  hospital 

1)  Education:  lectures/case  discussions; 

2)  Administrator  facilitates  in  follow-up  of  patient  and  compiles 
individual,  monthly  feedback  reports  (number  of  drugs  prescribed, 
calculation  of  the  drug  prescribing  index,  number  of  laboratory  and  X- 
ray  tests  ordered,  and  percent  of  positivity). 

Feedback  yielded  a  lower  drug  prescribing  index,  but  there  was  no 
change  in  laboratory  or  X-ray  test  orders. 


Pugh  JA,  Frazier  LM,  DeLong  E,  et  al,  "Effect  of  Daily  Charge  Feedback  on  Inpatient 
Charges  and  Physician  Knowledge  and  Behavior,"  Archives  of  Internal  Medicine  149:426- 
429,  1989. 


Objective:  Decrease  hospital  charges  (total  charges,  length  of  stay,  and  daily  room 
charge) 

Setting:  Inpatient  ward  of  a  teaching  hospital 

Intervention:  Daily,  individual  feedback  placed  on  patients'  charts  including  mean  total 
charges,  length  of  stay  and  room  charges  compared  to  the  daily  average 
for  patients  with  similar  diagnoses  from  the  prior  year 

Results:  No  difference  at  first;  however  45%  of  patients  had  protocol  admissions. 

When  protocol  admission  patients  were  excluded,  there  was  a  highly 


118 


significant  reduction  in  total  charges,  length  of  stay,  room  charges,  and 
diagnostic  testing. 


Putnam  W,  and  Curry  L,  "Patient  Care  Appraisal  in  the  Ambulatory  Setting:  Effectiveness 
as  a  Continuing  Education  Tool,"  Annual  Conference  on  Research  in  Medical  Education 
19:207-22,  1980. 


Objective:         Improve  treatment  of  bronchitis,  headache,  otitis  media,  hypertension, 
and  urinary  tract  infection 

Setting:  Ambulatory  family  practices  (in  Canada) 

Intervention:      Physicians  participated  in  criteria  setting,  face-to-face  individualized 
feedback  of  data,  and  received  educational  packages  upon  request. 


Results:  Intervention  yielded  significant  improvement  in  compliance  to  "essential" 

standards  of  care. 


Rosser  WW,  Simms  JG,  Patten  DW,  et  al.,  "Improving  Benzodiazepine  Prescribing  in 
Family  Practice  through  Review  and  Education,"  Canadian  Medical  Association  Journal 
124(2):  147-53,  1981. 


Objective:         Improve  benzodiazepine  prescribing 
Setting:  Family  medicine  centre 

Intervention:  As  part  of  an  educational  program,  physicians  informed  of  their  actual 
prescribing  patterns  of  benzodiazepines  against  recommendations  of 
guidelines  for  benzodiazepine  use 

Results:  During  the  six  months  after  the  intervention  there  was  a  decrease  in  the 

prescribing  of  benzodiazepines  to  patients  65  years  of  age  and  over,  a 
significant  shift  to  the  use  of  short-acting  benzodiazepines,  and  some 
reduction  in  the  daily  dose  and  duration  of  administration  of  diazepam. 


Schectman  JM,  Elinsky  EG,  and  Pawlson  LG,  "Effect  of  Education  and  Feedback  on 
Thyroid  Function  Testing  Strategies  of  Primary  Care  Clinicians,"  Archives  of  Internal 
Medicine  151:2163-2166,  1991. 


119 


Objective: 


Increase  compliance  to  guidelines  for  thyroid  function  test  ordering 


Setting:  Primary  care  HMO 

Interventions:     1)  Educational  memorandum 

2)  Reminder  memorandum  (two  months  later) 

3)  Reminder  memorandum  and  individual  feedback  (average  number  and 
type  of  thyroid  function  tests  ordered  per  patient) 

Results:  Education,  intervention  #1,  resulted  in  physicians,  physician  assistants, 

and  nurse  practitioners  increasing  their  compliance  to  thyroid  function 
test  ordering  guidelines  from  36%  to  67%.  Reminders  alone, 
intervention  #2,  increased  compliance  from  68%  to  81%  at  six  months 
and  79%  at  12  months.  Reminders  plus  feedback,  intervention  #3,  did 
not  produce  an  increase  (compliance  at  65%  before,  64%  after 
intervention). 


Schroeder  SA,  Renders  K,  Cooper  JK,  et  al.,  "Use  of  Laboratory  Tests  and 
Pharmaceuticals:  Variation  Among  Physicians  and  Effect  of  Cost  Audit  on  Subsequent 

Use .,"  Journal  of  the  American  Medical  Association  225(8) :969-973,  1973. 


Objective: 

Setting: 

Intervention: 


Results: 


Decrease  laboratory  and  pharmaceutical  charges 
University  medical  clinic 

Peer  comparison  feedback  (mean  annual  lab  tests,  prescriptions,  and 
combined  costs  for  the  patients  of  each  physician  during  prior  three 
months  and  repeated  three  months  later  with  the  same  group  of  patients 
for  each  physician) 

Feedback  yielded  a  29.2%  decrease  in  laboratory  charges,  but  led  to  a 
6.4%  increase  in  pharmaceutical  charges.  Decreases  in  laboratory 
charges  was  greatest  among  high-cost  physicians. 


Schroeder  SA,  Myers  LP,  McPhee  SJ,  et  al.,  "The  Failur  e  of  Physician  Education  as  a  Cost 
Containment  Strategy:  Report  of  a  Prospective  Controlled  Trial  at  a  University  Hospital," 
Journal  of  the  American  Medical  Association  252:225-230,  1984. 

Objective:         Decrease  total  charges  to  patients 


120 


Setting:  Teaching  hospital 

Interventions:     First  year:  1)  weekly  lectures,  2)  face-to-face  audit  of  medical  record  with 
group  discussion  (also  copies  of  patients'  bills) 
Second  year:  interventions  combined,  attending  involved;  weekly 
individual  feedback  on  the  percent  and  cost  of  the  services  that  were 
rated  as  clinically  unnecessary  in  previously  audited  cases. 

Results:  No  significant  decrease  in  total  charges  to  patients  for  any  intervention. 

Spiegel  JS,  Shapiro  MF,  Berman  B,  et  al.,  "Changing  Physician  Test  Ordering  in  a 
University  Hospital:  An  Intervention  of  Physician  Participation,  Explicit  Criteria,  and 
Feedback,"  Archives  of  Internal  Medicine  149(3):549-53,  1989. 


Objective:  Decrease  inappropriate  ordering  of  specific  tests:   routine  urinalyses, 

chest  X-rays,  leukocyte  differential  counts,  and  prothrombin  time  and/or 
partial  prothromblastin  time  tests 


Setting:  Teaching  hospital 

Intervention:      Attendings  (believed  to  also  be  leaders)  gave  flowsheets  of  tests  ordered 
to  residents  along  with  explicit  criteria  twice  a  week. 


Results:  Orders  for  initial  or  admission  chest  X-rays  decreased  by  22%.  Repeat 

orders  for  routine  urinalyses,  chest  X-rays,  and  leukocyte  differential 
counts  decreased  by  23%,  30%,  and  46%,  respectively.  Ordering  of  PT 
and  partial  prothromblastin  time  did  not  change. 


Studnicki  J,  Stevens  CE,  and  Knisely  L,  "Impact  of  a  Cybernetic  System  of  Feedback  to 
Physicians  on  Inappropriate  Hospital  Use ,"  Journal  of  Medical  Education  60:454-460, 1985. 

Objective:         Decrease  length  of  hospital  stay 

Setting:  Hospitals  in  western  Maryland 

Intervention:  Semimonthly,  individual  and  aggregate  feedback  (listed  specific  cases, 
aggregate  findings  for  each  physician  in  prior  two  weeks  and  over  the 
entire  study  period,  and  aggregate  findings  for  all  physicians  at  the 
hospital  and  all  physicians  in  all  six  study  hospitals;  three  most  frequent 
reasons  for  failure  to  meet  criteria,  total  number  of  admission  failures- 
compared  to  hospital  totals  and  totals  at  all  six  study  hospitals) 


121 


Results:  Physicians  with  a  high  volume  of  patients  also  had  a  higher  percentage 

of  inappropriate  hospital  use  (13%  versus  8%).  Physicians  with  a  low 
volume  of  hospital  use  had  no  decrease  in  their  hospital  use  with 
feedback  of  inappropriate  use;  whereas  high  volume  physicians  had  a 
42%  decrease-from  the  original  13%  down  to  8%. 

Sullivan  RJ,  Estes  EH,  Stopford  W,  et  al.,  "Adherence  to  Explicit  Strategies  for  Common 
Medical  Conditions,"  Medical  Care  18(4):388-399,  1980. 

Objective:         Improve  quality  of  care  for  urinary  tract  infection  and  upper  respiratory 
illness 


Setting:  University  primary  care  clinic 

Interventions:  Following  staff  approval  of  protocols  and  publication  of  a  booklet  of 
protocols,  data  collection  checklists  were  provided  to  staff  to  document 
actions  for  all  patients  with  symptoms  of  urinary  tract  infection  or  upper 
respiratory  infection;  on  the  basis  of  the  protocols,  weekly,  individual 
feedback  was  given  during  the  first  year  (tally  of  compliance  to  each  item 
on  checklist),  every  two  weeks  during  the  second  year  (only  a 
personalized  note),  and  computer-generated,  individual  feedback  during 
the  last  three  months 


Results: 


Alterations  in  feedback  mechanisms  had  little  relation  to  guideline 
adherence. 


Wennberg  JE,  Blowers  L,  Parker  R,  et  al.,  "Changes  in  Tonsillectomy  Rates  Associated 
with  Feedback  and  Review,"  Pediatrics  59(6):821-826,  1977. 

Objective:         Decrease  tonsillectomy  rates 

Setting:  Hospitals  in  13  service  areas  of  Vermont 

Intervention:      Feedback  of  variation  in  population-based  tonsillectomy  rates  to  the  state 
medical  society 

Results:  Over  a  five-year  period,  the  average  tonsillectomy  rates  in  all  areas 

declined  by  46%  (with  only  one  area  above  the  national  average); 
however,  the  relationship  between  feedback  and  the  change  in  practice 
could  not  be  documented. 


122 


Williams  SV,  and  Eisenberg  JM,  "Decreasing  Diagnostic  Test  Utilization,"  Journal  of 
General  Internal  Medicine  1:8-13,  1986. 

Objective:         Decrease  unnecessary  ordering  of  inpatient  laboratory  tests 
Setting:  Teaching  hospital 

Interventions:     First  year: 

1)  Education  regarding  laboratory  test  use 

2)  Notification  about  patients  who  did  not  meet  standard 

3)  Monthly,  individual  feedback  (and  aggregate  frequencies) 


Results: 


Second  year:  combined  interventions 

No  significant  effects  on  ordering  of  inpatient  tests  in  either  years. 


Winickoff  RN,  Coltin  KL,  Morgan  MM,  et  al,  "Improving  Physician  Performance  Through 
Peer  Comparison  Feedback,"  Medical  Care  22:527-34,  1986. 

Objective:         Increase  colorectal  screening 

Setting:  HMO 

Interventions:     Monthly  peer  comparison  feedback  and  aggregate  feedback  on  screening 
performance  from  two  months  earlier 

Results:  In  a  group  of  physicians  for  whom  education  alone  previously  had  failed 

to  improve  performance  there  was  improvement  when  feedback  was 
given.  The  effect  persisted  at  least  six  months  after  the  intervention  was 
halted,  but  there  was  some  decline  in  screening  over  the  next  six  months. 


Wones  RG,  "Failure  of  Low-Cost  Audits  with  Feedback  to  Reduce  Laboratory  Test 
Utilization,"  Medical  Care  25(1):78,  1987. 

Objective:         Decrease  ordering  of  diagnostic  laboratory  tests 
Setting:  Teaching  hospital 


123 


Intervention:  Semimonthly,  individualized  data  (total  test  charges,  total  patients  under 
care,  mean  tests  per  patient  per  day,  mean  total  charges  per  month; 
breakdowns  of  tests  ordered  and  charges,  number  of  each  test  ordered, 
and  total  spent  for  each  type  of  test)  and  group  comparison 

N.B.:  control  group  was  notified  that  their  ordering  rates  of  laboratory 
tests  was  being  monitored 

,  Result:  Feedback  yielded  no  change  in  laboratory  test  ordering. 


124 


fill 


(/>(/)(/)  cr>  cn 
c  ?--D  o  o 
=  q_  a>  =J  =; 

<   3  <Q  O  O 
8»   -•  <D  <D  <B 
CL  Q. 


O  — 


a  <* 


x  x  x      x     ;l x  x  x      x  -X;: 'x:  x  x  x      x  x  :x  x  x  x  x      xxx  X  x  x  x  x  x  x  'ijSiX  x  x  x 


C 


X  XX        XX  :-X: ;      XX  XXX 


X  X  ?»:  X 


O 

o 

3 


x  SiggX;  xx         x  x  .x.  x  x      xx         x  x  x     M>M  x         xxx      x     xxx      x  x  x  x  x  x  x 


2? 


XX  XX 


:  X   X  X  MWy'  XXX  X  ,X  ^V''       XXX        X  X        XXX        X  '  X  X 


33 
w 


x       X       X  X 


X       X  XX  X       X  X       X  X  xxx 


T3 


x  x  SXSXi  x 


x     x     x  x        xxx     :x:  x 


<2 


X  X  VMyyM  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  xxx 


X  :XKgg  X  X  X  X  X  x  XgX  X  X  X  x  x       iX-X  X  X  X  X  X 


3 

a 

<■ 


x     :-x;;:x  X  X  X  X 


X   X        X       iXiSS  XXX  XX! 


XXX     '  X  : 


g  -n 

!  - 

s;  3 


X   X  XX 


X         XX         X  :W-;:;.X:  X  X 


KXj  xxx 


o 
<1> 


o 


xxx      x  x      x  x  X      x  x      xxx  x         x  x  tm 


X        XX  X        X  xxx 


£  9 

ID 


X       XX      xxxxx  xxxx 


mm  mm 

XXX        X   X  "  XX        XXX  .X  :X   X  X 

yyyyyyyy 


X 


125 


PAPER  NO.  5 


PUBLIC  ACCESS  TO  PROFILING  INFORMATION 


Author: 


Emily  Friedman 
Address: 


917  West  Wolfram 
Chicago,  IL  60657 


PUBLIC  ACCESS  TO  PROFILING  INFORMATION 


Whatever  else  might  be  compromising  dissemination  of  profiling  data  to  the  public,  it  is 
not  a  shortage  of  information.  Even  a  cursory  glance  at  the  literature  leaves  one  dazzled 
by  the  amount  of  data  on  hospital  utilization,  physician  practice,  and  outcomes  being 
released.  In  recent  months,  we  have  seen  analyses  of  hospital-specific,  and  now  (in  New 
York  state)  physician-specific  mortality  rates;  discussions  of  practice  variations  in  the 
treatment  of  breast  cancer;  documentation  of  the  lower  quantity  (and  perhaps  quality)  of 
care  afforded  to  the  medically  indigent;  and  a  host  of  other  numbers  about  health  care 
decisions  and  their  consequences.  Yet  it  is  safe  to  say  that  the  amount  of  this  information 
that  has  had  a  major  effect  on  the  public  -  at  least  in  terms  of  their  choosing  providers 
on  the  basis  of  such  data  ~  has  been  virtually  nil. 

With  a  few  high-visibility  exceptions,  use  of  health  services  research  in  general  and 
profiling  information  in  particular  by  the  public  has  not  been  widespread.  The  question 
is  what  to  do  about  the  situation.  For  if  reliable,  high-quality,  properly  evaluated  profiling 
information  will  indeed  be  more  available  in  the  future,  and  if  the  goal  is  for  that 
information  to  influence  the  public's  health  care  choices,  then  it  is  necessary  to  learn  why 
dissemination  to  the  public  of  important,  even  crucial  health  care  data  and  profiling 
information  has  had  so  little  impact.  In  order  to  do  so,  both  exceptions  (when  data  have 
been  successfully  disseminated  and  have  had  a  significant  impact)  and  the  rule  must  be 
examined. 

This  is  not  to  say  that  the  dissemination  of  profiling  data  and  similar  information  has  been 
a  total  failure.  Indeed,  these  data  have  at  times  influenced  health  policy,  payment 
configuration,  and  the  practice  of  medicine  itself.  These  successes  have  made  health 
services  research  a  major  industry,  employing  more  and  more  people  in  academia,  research 
centers,  and  (increasingly)  entrepreneurial  ventures.  Nonetheless,  the  public  does  not 
appear  to  have  embraced  these  outpourings  to  any  great  degree. 

Three  general  observations  about  why  this  may  be  so  are  in  order.  First,  it  is  important 
-  and  yet  difficult  -  for  providers,  researchers,  and  policymakers  to  keep  in  mind  that 
most  human  beings  do  not  walk  around  thinking  of  themselves  as  patients.  "Patienthood" 
is  a  theoretical  concept.  More  important,  it  is  a  transient  state  that  most  people  wish  to 
avoid,  and  out  of  which  they  wish  to  flee  as  soon  as  possible.  With  a  few  hypochondriacal 
exceptions,  few  of  us  wish  to  think  of  ourselves  as  persistently  sick  or  injured;  even  those 
with  chronic  conditions  prefer  to  think  in  terms  of  optimal  function  and  striving  toward 
better  health. 


127 


Therefore,  when  we  speak  of  trying  to  reach  the  patient  with  profiling  information,  we  are 
usually  talking  about  trying  to  reach  people  when  they  are  already  sick  or  injured,  and  are 
already  under  treatment  or  seeking  treatment.  They  are,  therefore,  probably  somewhat 
compromised  in  their  judgment  and  decision  making  abilities. 

Furthermore,  when  our  "patienthood"  is  ended,  we  do  not  like  to  think  about  it  much. 
Probably  the  last  thing  we  want  to  dwell  on  is  data  concerning  our  recent  unhappy 
experience  and  how  it  ranked  in  comparison  with  other  people's  unhappy  experiences. 

So  when  people  do  have  a  high  consciousness  of  disease  and  its  treatment,  they  may  not 
be  in  a  state  that  makes  acceptance  and  use  of  profiling  data  easy.  And  when  they  are  in 
a  less  compromised  state,  we  are  faced  with  seeking  to  get  them  interested  in  and  involved 
with  this  information  at  a  time  when  it  is  not  really  important  to  them. 

Second,  with  advances  in  information  technology,  from  cable  television  to  videotape  and 
beyond,  we  live  with  information  overload.  That  can  breed  disinterest  in  valuable 
information,  even  in  a  field  such  as  health  care,  which  has  been  demonstrated  to  spark 
high  levels  of  public  interest.  As  news  reports  and  publications  provide  more  and  more 
information  about  health  care  topics,  we  are  less  and  less  able  to  internalize  it  all. 

This  is  not  generally  true  if  we  are  suffering  from  a  particular  ailment;  in  that  case,  most 
patients  are  eager  for  as  much  information  as  they  can  get  about  their  condition.  But,  as 
we  rarely  know  what  illness  we  may  contract  in  the  months  or  years  to  come,  we  cannot 
focus  on  that  particular  condition.  We  instead  face  being  bombarded  with  massive 
amounts  of  information  about  myriad  illnesses,  injuries,  treatment  options,  and  so  forth. 
We  risk  becoming  jaded. 

Furthermore,  reflecting  the  general  view  of  what  the  news  media  choose  to  emphasize, 
much  of  the  health  care  news  is  bad.  Most  of  the  warnings  are  grim,  the  scandals 
proliferate,  and  now  even  Magic  Johnson  is  HIV-positive  and  Arthur  Ashe  has  AIDS. 

I  am  impressed  by  the  growing  number  of  Baby  Boomers,  even  those  with  high  health 
consciousness,  who  are  starting  to  leave  the  skin  on  the  chicken  or  eat  Whoppers  with 
cheese.  When  I  ask  why,  I  hear  what  was  once  a  throwaway  line:  "Everything  causes 
cancer."  It  does  not  seem  an  irrational  conclusion  in  the  face  of  an  Alar,  silicon  implant, 
or  radon  scare  every  two  or  three  months.  The  resulting  cynicism  works  against 
dissemination  of  profiling  information. 

This  jaded  response  to  health  care  data  is  a  particular  problem  for  health  services 
researchers  and  others  who  wish  to  disseminate  profiling  information  to  the  public.  For 
one  thing,  health  services  research  has  distinguished  itself  to  date  by  showing  virtually  no 
self-discipline  in  terms  of  the  amount  and  quality  of  the  numbers  it  spews  forth;  data 
overload  is  the  order  of  the  day.  As  Physician  Payment  Review  Commission  staff  observed 

128 


in  a  recent  paper,  "The  availability  of  information  may  be  outpacing  our  ability  to  use 
profiling  effectively  and  responsibly"  (Lasker  et  al.  1992).  Yet  asking  researchers  to  be 
more  selective  in  terms  of  what  they  release  is  essentially  asking  an  embedded  research 
culture  to  change. 

Another  problem  is  that  those  who  are  working  with  profiling  information  have  not  proven 
adept  at  translating  it  into  terms  that  are  useful  even  for  providers,  let  alone  into  lay 
language  for  use  by  patients  and  the  public  at  large.  And  if  lay  people  are  proving 
resistant  to  accessible  information,  such  as  the  documented  risk  of  contracting  lung  cancer 
from  radon  or  the  association  of  autoimmune  disease  with  silicon  implants,  one  wonders 
how  receptive  they  will  be  to  complex  information  about  comparative  physician  practice 
patterns  in  the  diagnosis  and  treatment  of  gallbladder  disease. 

Indeed,  David  Axelrod,  M.D.,  former  New  York  state  health  commissioner,  was  quoted 
as  having  said  that  "he  was  concerned  about  the  implications  of  dumping  a  huge  body  of 
statistical  information  on  the  public  and  about  the  unpredictable  effects  that  the  release 
of  such  data  would  have  on  medicine,  patients,  and  the  legal  system"  (Altman  1992). 

A  third  impediment  to  patients'  using  profiling  data  to  make  provider  and  treatment 
choices  is  the  widespread  belief  that  patients  continue  to  have  the  power  to  make  such 
choices.  If  all  253  million  of  us  had  indemnity  insurance  with  open-ended  choice  of 
provider,  a  variety  of  hospitals  and  physicians  from  which  to  choose,  and  no  long-term 
provider  loyalties,  then  we  could  use  profiling  information  to  compare  and  contrast 
providers  and  select  those  whose  profiles  appeared  most  compatible  with  our  desires  and 
pocketbooks.  Indeed,  this  is  the  cornerstone  of  many  of  the  arguments  for  public  release 
of  profiling  data:  that  it  would  empower  patients. 

However,  a  person  is  only  theoretically  empowered  if  he  or  she  belongs  to  an  HMO,  PPO, 
or  "managed  indemnity"  plan,  as  90  percent  of  working  Americans  with  employer-based 
coverage  do  (KPMG  Peat  Marwick  1991).  If  our  self-insured  employer  announces  that  we 
will  be  treated  only  at  Memorial  Hospital  (unless  we  wish  to  pay  for  the  care  ourselves), 
then  Memorial  Hospital's  data  profile  is  irrelevant  to  us.  If  you  are  one  of  the  63  million 
Americans  who  lives  in  a  rural  area,  exercising  a  choice  of  institutional  provider,  or  even 
a  choice  of  physician,  can  require  leaving  the  county.  And,  of  course,  if  you  are  uninsured 
and  poor,  or  (increasingly)  a  Medicaid  client,  you  are  likely  to  be  grateful  to  any  provider 
who  is  willing  to  provide  care  to  you.  It  would  be  inadvisable  to  start  asking  about  that 
provider's  performance  data. 

And  if  decision  making  for  planned,  elective  health  care  services  is  compromised,  decision 
making  in  emergencies  is  almost  nonexistent.  A  trauma  patient  in  an  ambulance,  someone 
who  is  suffering  chest  pain,  or  a  child  who  just  sliced  open  his  finger  will  in  all  probability 
be  taken  to  the  nearest  emergency  department  that  is  available,  and  no  one  is  going  to 
check  that  hospital's  or  those  emergency  physicians'  credentials. 


129 


In  other  words,  more  and  more  patients  are  not  free  to  choose  their  providers;  even  if  they 
insist,  even  if  they  are  armed  with  data,  they  often  are  not  allowed  to  make  such  choices 
unless  they  are  willing  to  pay  for  them. 

Thus  there  are  formidable  obstacles  to  the  direct  use  of  profiling  information  by  the  lay 
public.  Some  of  these  can  be  overcome;  but  some  cannot,  short  of  a  herculean  effort  that 
is  beyond  the  capacity  of  the  health  services  research  and  policy  communities  at  this  time. 

-  It  thus  seems  likely  that,  at  least  in  the  short  term,  the  release  of  profiling  information  for 
public  use  will  consist  of  the  release  of  such  information  to  "brokers"  who  represent,  or 
claim  to  represent,  the  public's  interest.  Among  these  are: 

Third-party  payers  such  as  the  Health  Care  Financing  Administration  and  private  insurers, 
who  are  already  deeply  involved  in  the  generation  and  use  of  profiling  information. 
However,  one  might  observe,  cynically,  that  in  many  cases  their  enthusiasm  is  more  for 
cost  containment  than  for  ensuring  the  quality  and  appropriateness  of  care.  However,  it 
is  hardly  irrational  on  their  part  to  want  to  protect  a  huge  investment. 

Employers,  in  either  of  two  roles:  As  those  who  decide  what  type  of  coverage  to  purchase, 
or  as  those  who  are  actually  payers,  as  in  the  case  of  self-insured  employers.  Growth  in 
the  number  of  employers  who  contract  directly  with  providers  places  them  even  more 
centrally  in  the  role  of  broker  acting  on  behalf  of  patients. 

Consumer  organizations,  which  often  conduct  their  own  data  analysis  and  have  proven  very 
effective  at  drawing  the  attention  of  the  press  to  their  conclusions.  They  have  played  a  key 
role  in  situations  involving  medical  devices,  pharmaceuticals,  questionable  surgical 
procedures,  and  environmental  health  hazards.  However,  many  of  these  organizations  tend 
to  exhibit  distrust  or  even  dislike  of  physicians  and  hospitals.  Although  this  may  be 
justified  in  some  cases,  it  makes  consumer  advocates  poor  candidates  for  educating 
providers  or  aiding  them  in  changing  their  patterns  of  practice.  After  all,  terrorizing 
physicians  and  hospitals  has  not  proven  to  be  a  very  effective  pathway  to  change. 

Consumer  advocates  have  also  been  known  to  oversimplify  issues  and  deal  in  very  broad 
strokes.  One  recent  example  was  the  characterization  by  a  leading  consumer  advocate  of 
a  severity  adjustment  system  as  "unassailable,"  which  is  not  true  of  any  severity  system 
currently  available. 

The  news  media  and  consumer  press,  which  report  much  health  care  news  and  will  continue 
to  do  so.  Their  reportage,  however,  is  heavily  weighted  toward  "breakthroughs"  on  the  one 
hand  and  scares  and  scandals  on  the  other.  The  public  press  is  drawn  to  the  extremes  of 
health  care,  good  and  bad;  it  has  shown  far  less  interest  in  more  subtle  issues,  such  as 
physician  practice  pattern  variations  or  even  hospital-specific  mortality  rates,  which,  after 


130 


all  the  controversy,  have  become,  for  the  most  part,  a  non-event  as  far  as  the  public  is 
concerned. 

Attorneys,  especially  the  plaintiffs  bar,  which  is  becoming  skilled  at  the  selective  use  of  data 
and  can  be  expected  to  use  profiling  data  as  they  become  available.  Unfortunately,  this 
use  will  continue  to  be  in  the  clumsy  form  of  retrospective  quality  assessment  that 
malpractice  litigation  represents.  Although  an  experiment  in  Maine  is  seeking  to  use 
outcomes  data  and  clinical  protocols  to  reduce  frivolous  or  inappropriate  malpractice 
litigation,  its  results  are  not  yet  known. 

Providers,  both  institutional  and  individual,  who  may  seem  odd  candidates  to  serve  as 
brokers  of  profiling  information,  given  that  it  involves  data  about  themselves  and  their 
peers.  But  that  potential  conflict  of  interest  is  no  more  troubling  than  the  conflict  of 
interest  faced  by  payers  who  are  trying  to  control  costs  and  ensure  quality  simultaneously. 

Besides,  policymakers  have  long  used  providers  as  brokers  in  a  different  way:  by  using 
hospitals  to  get  to  physicians.  This  has  not  worked  very  well,  and  has  produced  some 
problematic  unintended  consequences,  not  the  least  of  which  is  the  fracturing  of  the 
physician-hospital  bond,  which,  even  if  it  was  volatile,  did  exist.  Nevertheless,  the  effort 
to  use  hospitals  to  modify  physician  behavior  (as  in  the  implementation  of  Medicare 
prospective  payment)  was  predicated  on  an  assumption  that  providers  can  influence  each 
other  and  cause  each  other  to  change,  and  it  is  a  valid  assumption. 

Indeed,  it  can  be  argued  that  providers  -  directly  and  through  the  Joint  Commission  on 
Accreditation  of  Healthcare  Organizations  (JCAHO)  ~  can  serve  as  ideal  brokers  of 
profiling  information.  Who  else  is  better  qualified  to  study,  use,  and  react  to  these  data? 
Furthermore,  as  the  goal  is  change  and/or  consistency  in  provider  behavior,  asking 
providers  to  effect  that  change  is  the  most  direct  way  to  do  it  and  allows  the  quickest 
response  time.  Indeed,  with  the  growth  of  managed  care,  the  possibilities  of  an 
organization  like  the  Kaiser  Foundation  Health  Plan  setting  data-informed  standards  for 
itself,  and  providing  a  model  for  others,  are  tantalizing. 

It  must  be  conceded,  however,  that  providers  do  not  have  an  outstanding  or  even 
acceptable  record  of  picking  up  and  acting  on  clues  —  even  blatant  clues  —  that  practice 
patterns  should  be  scrutinized  or  changed.  Indeed,  the  tendency  of  physicians  to  protect 
their  own,  at  the  expense  of  patients,  has  been  so  widespread  that  David  Axelrod  once 
lamented,  "Physicians  have  forgotten  the  difference  between  what  they  owe  to  fellow 
professionals  and  what  they  owe  to  their  profession"  (Friedman  1989b). 

Nevertheless,  in  the  end,  it  is  providers  to  whom  patients  turn,  and  providers  whom 
patients  trust  (or  at  least  want  to  trust).  And,  contrary  to  much  of  the  consumerist 
philosophy  being  espoused  these  days,  one  of  the  most  delicate  areas  into  which  the 
profiling  movement  threatens  to  blunder  is  the  trust  relationships  between  patients  and 


131 


physicians,  nurses,  and  hospitals.  These  relationships  are  of  great  importance  in  the 
process  of  healing,  and  should  not  be  disrupted  cavalierly. 

This  is  not  to  be  overly  romantic  about  the  nature  of  the  patient-provider  relationship,  for 
it  is  inherently  unequal.  A  physician  recently  asked  me,  pursuant  to  my  arguing  that 
patients  had  to  be  provided  with  more  information  about  their  options  in  terms  of 
treatment  (including  the  right  to  refuse  treatment),  how  physicians  could  "survive"  an 
egalitarian  partnership  with  patients.  I  assured  him  that  the  relationship  can  never  be 
egalitarian.  Physicians  do  not  spend  years  in  education  and  training  for  no  reason.  If  they 
did  not  have  the  specialized  knowledge,  skills,  instincts,  language,  and  experience  they  do, 
we  would  not  reward  them  so  highly  with  both  income  and  respect-especially  when  we 
often  find  their  ways  mysterious  or  even  irritating. 

Providers  hold  an  information  monopoly.  And  even  the  most  well-informed  patient,  who 
has  studied  up  on  his  or  her  condition  to  a  high  degree,  can  be  brought  to  his  or  her  knees 
by  the  onslaught  of  test  results,  imaging  procedures,  surgical  versus  medical  options,  and 
other  data  that  the  physician  can  produce.  As  a  result,  even  after  the  most  thorough 
informed  consent  discussions  and  presentation  of  treatment  options,  most  patients  will  still 
say,  "So  what  do  you  think,  Doc?  Should  I  have  it  done  or  not?" 

Nevertheless,  providers,  with  proper  encouragement,  incentives,  and  safeguards,  can  and 
do  make  excellent  brokers  of  profiling  information,  and  they  should  be  encouraged  to  get 
into  the  game,  rather  than  being  made  paranoid  and  hostile  by  their  exclusion. 

But  every  rule  has  exceptions.  There  are  indeed  examples  of  the  use  of  outcomes  and 
other  profiling  information  having  a  major  impact  on  the  public,  in  each  case  through  the 
actions  of  brokers.  The  first  is  the  health  care  horror  story:  Dr.  Burt  in  Ohio  butchering 
women's  genitals,  "Dr.  X"  in  New  Jersey  giving  fatal  doses  of  curare  to  patients,  the  fertility 
specialist  who  used  his  own  sperm  to  father  75  children,  the  deaths  of  Libby  Zion  and 
Andy  Warhol.  Is  this  really  use  of  profiling  information?  Yes;  indeed,  at  its  core,  it  is  not 
that  different  from  the  release  of  physician-specific  mortality  data. 

Why  is  the  disclosure  of  this  type  of  information  effective?  Because  we  are  drawn  to 
scandal,  whether  it  involves  Mike  Tyson,  Watergate,  or  Dr.  Burt.  Scandals  individualize 
a  situation;  they  put  a  face  on  it,  and  make  it  easy  for  us  to  relate  to  it.  And  when  it  is 
a  health  care  scandal,  patients  talk  to  providers,  and  each  other,  about  it.  Virtually  every 
dentist  in  the  country  wants  to  know  how  dentist  David  Acer  infected  Kimberly  Bergalis 
and  his  other  patients  with  human  immunodeficiency  virus,  simply  so  they  can  reassure 
their  patients.  And  scandals  change  clinical  practice;  most  dentists  wear  gloves  now. 

This  should  not  be  construed  as  an  endorsement  of  scandal-mongering  as  an  exemplary 
way  to  disseminate  information  about  provider  practice  patterns  and  outcomes.   It  is 


132 


simply  pointing  out  that  this  has  proven  to  be  a  very  effective  means  of  communicating 
such  information  the  public. 

The  second  example  is  the  battle  over  high  rates  of  cesarean  section  (Friedman  1990).  An 
unintentional  coalition  of  physicians  (who  were  instrumental,  although  their  role  is  often 
overlooked),  payers,  feminist  activists,  consumer  groups,  and  reporters  began  in  the  early 
1980s  to  attack  the  rate  of  cesarean  section,  which  rose  from  7  percent  in  1970  to  24 
percent  in  the  mid-1980s.  Efforts  on  a  variety  of  fronts,  including  patient  education, 
publicity,  public  listing  of  hospitals  with  high  rates,  use  of  protocols,  physician-to-physician 
discussion,  pressure  on  the  American  College  of  Obstetricians  and  Gynecologists,  and, 
ultimately,  changes  in  the  financial  incentives  to  perform  cesareans  that  were  inherent  in 
payment  policy,  led  to  a  stabilization  in  the  primary  cesarean  section  rate  and  a  reduction 
in  the  repeat  cesarean  rate  by  1989.  (The  rates  have  remained  stable  since.) 

This  campaign  was  sparked  by,  and  informed  by,  data  and  profiling  information  ~  and  it 
led  to  wider  availability  of  data.  Among  the  responses  was  the  requirement  in  some  states 
that  a  pregnant  patient  be  told  her  prospective  delivery  hospital's  cesarean  rate,  although 
I  have  seen  no  reports  as  to  what  effect,  if  any,  this  form  of  dissemination  has  had. 

Why  was  this  effort  so  successful?  First,  the  procedure  in  question  involved  the 
reproductive  system.  Both  men  and  women  are  very  sensitive  to  any  suggestion  that  their 
genitals,  breasts,  or  other  sexual  organs  are  being  put  at  unnecessary  risk.  It  is  no  accident 
that  three  of  the  clinical  areas  that  have  drawn  the  most  attention  in  the  move  toward 
profiling  have  been  mastectomy,  cesarean  section,  and  prostatectomy. 

John  Wennberg,  M.D.,  has  observed  that  in  order  for  data  to  be  used  effectively  in 
informing  patient  choices,  patients  must  be  aware  of  their  options  for  treatment 
(Wennberg  1990).  He  adds  that  when  this  information  is  available  to  them,  patients  tend 
to  be  far  more  conservative  than  physicians  in  the  choices  they  make.  Although  this  would 
likely  be  true  in  most  situations  involving  the  patient's  own  body  (the  case  of  a  spouse  or 
child  might  lead  to  different  decisions),  it  is  meaningful  that  the  research  that  led  him  to 
these  conclusions  involved  prostate  problems.  Thus  the  patients  with  whom  he  shared  data 
and  treatment  options  were  even  more  likely  to  make  conservative  decisions  about  surgery. 

Another  aspect  of  the  cesarean  situation  is  that  a  number  of  different  elements  were 
involved,  whether  they  were  working  together  or  not.  Almost  every  sector  of  health  care 
and  health  policy  participated  in  some  aspect  of  the  campaign.  Coalitions  do  tend  to  get 
the  word  out  faster  and  have  more  of  an  impact. 

Also,  the  issue  became  politicized.  Those  who  opposed  overuse  of  cesarean  section  for 
reasons  of  quality  and  cost  became  aligned  with  members  of  the  women's  movement  who 
saw  rising  cesarean  rates  as  evidence  of  physicians'  abuse  of  women.  Once  a  health  care 
issue  becomes  political,  policy  responses  tend  to  occur  in  short  order.  Any  conclusions  or 


133 


recommendations,  from  whatever  source,  are  more  likely  to  result  in  policymaking. 
Politicizing  the  process  of  creating  profiles  and  establishing  standards  may  not  be  a 
positive  development;  but  it  cannot  be  denied  that  passing  laws  or  enacting  regulations 
greatly  speeds  that  process  and  encourages  (or  mandates)  compliance. 

A  third  example  of  profiling  that  has  changed  public  and  provider  behavior  has  been  the 
generic  drug  revolution  and  related  efforts  to  promote  the  use  of  less  expensive  but 
apparently  equivalent  alternatives  to  high-priced  pharmaceuticals.  One  example  was  the 
.  recent  battle  over  the  relative  effectiveness  of  tissue  plasminogen  activator  (TPA)  and 
streptokinase  (Friedman  1989a);  another  was  the  consumer  effort  to  convince  Burroughs 
Wellcome  to  reduce  the  cost  of  zidovudine  (AZT).  Given  the  costs  involved,  it  is  not 
surprising  that  a  search  for  less  expensive  alternatives  is  likely  to  begin  the  day  a  product 
is  cleared  by  the  Food  and  Drug  Administration. 

But  when  it  comes  to  choices  of  drugs,  the  public  is  often  left  out  of  the  decision  making 
(the  AZT  fight  was  an  exception).  The  fact  is  that  the  public,  payers,  and  often  even 
providers  know  so  little  about  many  drugs  that  they  will  accept  almost  any  reasonable 
argument  for  or  against  them,  whether  it  is  that  a  given  drug  cannot  be  duplicated,  or  that 
its  generic  equivalent  is  just  as  good.  In  other  words,  in  the  face  of  an  information  vacuum 
and  total  lack  of  knowledge,  any  kind  of  information  is  welcomed.  That  can  be  a 
dangerous  situation  in  itself. 

What  are  the  lessons  of  these  and  other  examples  of  dissemination  of  what  profiling 
information  we  have,  as  well  as  other  information  about  the  provision  of  health  care  and 
the  practice  of  medicine? 

1.  Although  there  should  be  continued,  reasoned  efforts  to  provide  the  public  with  profiling 
information  in  a  useful  form,  much  of  the  use  of  information,  and  dissemination  to  the  public, 
will  be  by  brokers.  Therefore,  brokers  should  be  involved  in  the  process  of  formatting, 
generating,  analyzing,  and  releasing  profiling  information.  They  also  should  be  educated 
about  how  to  share  the  information  with  the  public  in  a  responsible  and  dispassionate 
manner. 

Two  issues  are  of  paramount  interest  in  this  regard.  The  first  is,  if  brokers  are  the  key 
players,  how  can  the  public  (defined  as  patients  and  potential  patients)  be  involved  in  the 
move  toward  use  of  profiling  information?  One  possibility  is  that  representatives  of  the 
public  be  involved  on  the  "front  end,"  by  participating  in  the  decisions  and  processes  that 
generate  profiling  information,  rather  than  being  the  passive  recipients  of  the  information 
after  it  has  passed  through  many  other  hands  (PPRC  1992).  If  representatives  of  the 
public  can  be  so  involved,  it  would  strengthen  accountability  to  the  public,  regardless  of 
whether  the  information  is  later  widely  disseminated  or  not.  However,  given  the 
impediments  to  public  involvement  elucidated  earlier  in  this  paper,  I  am  pessimistic  that 
such  direct  public  involvement  can  be  achieved. 


134 


Furthermore,  although  many  entities  claim  to  represent  the  public's  interest,  few  can 
document  that  they  do  so  effectively.  Probably  the  "purest"  examples  have  been  the 
single-issue  interest  groups  such  as  those  who  fought  for  vaginal  birth  after  a  previous 
cesarean,  or  an  end  to  combined  biopsy  and  mastectomy  without  specific  patient  consent 
for  the  mastectomy.  It  may  be  that  public  involvement  would  have  to  be  in  the  form  of 
single-issue  efforts  and  discussion,  if  the  motivation  of  the  alleged  public  representatives 
is  not  to  be  suspect. 

Those  who  are  thought  to  represent  the  public  include  the  lay  press,  government, 
consumer  organizations,  and  organized  labor.  Although  each  of  these  has  produced 
articulate,  intelligent,  undoubtedly  sincere  spokesmen,  none  of  them  can  lay  claim  to  truly 
representing  the  highly  heterogeneous,  often  divided,  endlessly  complex  population  of  this 
country  ~  and  furthermore,  each  has  powerful  hidden  agendas  that  could  even  work 
against  the  public  interest.  In  a  democracy,  we  prize  participatory  processes  and  actions 
above  all;  yet  in  this  case,  we  must  acknowledge  that  no  one  truly  represents  the  people 
as  a  whole.  Government  is  a  payer;  unions  have  a  huge  stake  in  how  health  care  costs  and 
coverage  are  determined;  the  press  always  has  an  ideological  bias;  broad-based  consumer 
organizations  do  as  well,  and  are  usually  lobbyists  in  the  bargain. 

Rather  than  trying  to  find  a  nonexistent  proxy  for  the  public,  perhaps  we  would  be  better 
served  by  conceding  that  none  exists,  and  trying  instead  to  educate,  monitor,  and  provide 
proper  incentives  for  responsible  behavior  to  those  brokers  who  will,  in  the  end,  be  making 
the  decisions. 

An  individual  patient,  working  with  an  individual  provider  in  a  specific  situation,  can  and 
should  be  provided  with,  accept,  consider,  and  act  on  profiling  information.  In  the  macro 
world  of  policy  and  the  press,  however,  brokers  will  prevail. 

The  second  issue  involves  the  "convener"  function  (PPRC  1992)  -  the  role  of  bringing 
together  all  interested  parties  to  come  to  conclusions,  promote  standards,  and  spark  action, 
based  on  profiling  information. 

I  have  already  argued  that  providers  are  well-suited  (perhaps  best-suited)  for  this  role;  but 
other  candidates  have  been  proposed,  including  an  independent  commission,  a  government 
entity,  or  a  broad-based  community  coalition,  as  is  starting  to  evolve  in  some  metropolitan 
areas.  The  latter  would  be  a  welcome  development,  as  it  would  serve  two  critical  needs: 
to  have  the  process  of  profiling  be  as  local  as  possible  (thus  reflecting  local  circumstances 
that  would  be  lost  in  huge  national  data  banks),  and  to  involve  as  many  interested  parties 
as  possible.  The  greater  the  credibility  of  the  effort,  the  greater  the  chances  of  success. 

A  "convener"-type  approach,  however,  requires  two  commitments.  First,  the  community 
must  truly  be  represented,  which  means  inclusion  of  the  voiceless,  the  powerless,  the 
unattractive,  the  non-power-brokers.  This  makes  for  a  much  messier  process  than  the 


135 


traditional  smoke-filled-room  approach,  but  it  might  be  the  only  way  that  the  public  could 
be  truly  involved.  Second,  if  a  consensual,  open  process  is  to  be  used,  it  must  develop 
means  of  involving  the  recalcitrant,  the  hesitant,  the  resistant,  and  the  stubborn.  This 
means  going  beyond  the  voluntary  approach  to  the  use  of  strong  incentives  and,  if 
necessary,  penalties. 

In  my  opinion,  this  is  preferable  to  payers'  controlling  the  use  of  profiling  information. 
But  if  not  all  providers  are  willing  to  participate,  nothing  will  be  accomplished.  It  is 
,  extremely  difficult,  on  a  local  level,  for  businessmen  and  insurers  to  strong-arm  providers 
with  whom  they  do  business  or  socialize.  Nevertheless,  it  is  necessary  if  top-down 
mandates  from  afar  are  to  be  avoided. 

2.  The  patient-physician  relationship,  and,  where  appropriate,  the  hospital-physician 
relationship  should  be  protected  whenever  possible.  Going  off  half-cocked  in  a  frenzied 
attack  on  physicians  and  hospitals  can  and  will  cripple  the  process  of  care  and  healing.  It 
can  also  produce  exactly  the  opposite  effect  from  the  one  desired,  in  that  lay  persons  will 
get  so  tired  of  hearing  about  bad  doctors  and  unsafe  hospitals  that  they  will  start  ignoring 
profiling  information,  even  when  it  is  accurate  and  important. 

Furthermore,  nothing  is  served  by  blackening  the  reputations  of  hospitals  and  physicians 
and  other  providers  with  the  indiscriminate  use  of  information  that  only  tells  part  of  the 
story.  For  example,  the  recent  release  of  physician-specific  mortality  data  for  coronary 
artery  bypass  graft  surgery  in  New  York  state  raises  a  host  of  issues  about  interpretation, 
adequacy  of  adjustments,  and  privacy  (NAHDO  News  1992).  Patients  can  become 
suspicious,  providers  can  become  paranoid;  good  providers  can  be  portrayed  as 
incompetent,  bad  providers  as  excellent. 

It  is  little  wonder  that  the  New  York  state  health  department,  which  has  never  been  known 
for  coddling  providers,  fought  the  public  release  of  these  mortality  rates  with  the  names 
of  the  physicians  accompanying  them.  However,  it  is  too  late  now.  But  whenever  data  are 
to  be  released  or  used,  one  of  the  questions  that  must  be  asked  first  is,  what  harm  will  be 
done?  To  whom?  And  can  that  harm  be  justified? 

3.  Information  is  most  useful  to  people  before  they  become  patients,  but  in  all  probability  they 
are  most  receptive  to  it  when  they  are  patients.  It  is  unlikely  that  Wennberg's  project  to 
provide  information  on  options  for  treatment  of  benign  hypertrophic  prostatitis  will  draw 
the  interest  of  many  young  men  with  healthy  prostates;  once  the  condition  has  set  in, 
however,  patients  who  are  candidates  for  prostatectomy  have  proven  very  interested  in  the 
information  (Wennberg  1990). 

People  are  most  interested  in  ~  often,  are  only  interested  in  ~  information  about 
conditions  to  which  they  and  their  loved  ones  are  subject.  This  would  suggest,  again,  that 


136 


condition-specific  use  of  profiling  information  offers  the  best  hope  for  involving  lay 
persons,  be  they  patients  or  potential  patients.  Indeed,  with  the  onset  of  genetic  testing, 
people  may  well  know  long  ahead  that  they  are  candidates  for  a  certain  condition. 
Disease-specific  profiling  information  to  those  who  are  interested  in  that  disease,  then,  is 
likely  to  be  the  most  successful  strategy. 

4.  Profiling  information  must  be  accurate  and  must  be  used  fairly  and  universally.  One  must 
wonder  at  the  distinctions  made  in  the  evaluation  of  quality  between  hospitals  and  HMOs, 
simply  because  the  federal  government  prefers  HMOs.  We  have  already  had  scandals 
involving  the  quality  of  managed  care  arrangements  for  beneficiaries  of  public  programs, 
with  more  to  come.  Yet  HCFA  is  still  proposing  that  quality  scrutiny  of  HMOs  be 
relaxed.  This  is  also  a  form  of  politicization  --  of  introducing  ideology  into  quality  review 
-  and  it  is  as  repellant  as  any  other  form  of  ideological  bias  in  what  should  be  a 
dispassionate  process.  Profiling  information  must  be  gathered  for  all  providers,  and  used 
to  monitor  all  providers,  regardless  of  their  political  fashionability. 

5.  If  payers  are  the  main  brokers,  someone  has  to  protect  the  uninsured  Some  35  million 
people  have  no  health  insurance,  and  studies  suggest  that  they  are  at  higher  risk  of 
receiving  insufficient,  inappropriate,  or  downright  poor  care  (Friedman  1992;  Hadley  et 
al.  1991;  Young  and  Cohen  1991).  Payers  could  well  end  up  as  the  main  users  of  profiling 
information  (whether  by  design  or  default).  It  must  then  be  understood  that  payers  usually 
are  only  interested  in  the  populations  for  whom  they  pay.  That  can  leave  the  uninsured 
possibly  consigned  to  the  worst  providers,  out  of  sight,  out  of  mind.  They  might  well  be 
better  off  receiving  no  care  at  all. 

6.  The  distinction  must  be  made  between  quality  and  cost.  Wennberg  asked,  many  years 
ago,  "Which  rate  is  right?"  (Wennberg  1986).  One's  emotional  response  to  an  epidemic 
of  cesarean  sections,  hospitalizations  for  pediatric  pneumonia,  or  carotid  endarterectomies 
is  that  the  lowest  rate  is  best.  Yet  it  is  certain  that  there  are  women  who  need  cesareans 
and  do  not  get  them.  We  know  that  a  great  many  mentally  ill  individuals  need  care  and 
do  not  receive  it,  even  with  the  explosion  in  boutique  psychiatric  facilities  for  the 
well-insured.  The  entire  profiling  movement  could  be  sullied,  or  even  broken,  if  cost 
containment  and  quality  become  confused  with  each  other.  For  it  is  inevitable  that  the  two 
will  conflict.  The  health  systems  agencies  and  peer  review  organizations  both  learned  this 
to  their  sorrow.  Despite  the  cynicism  that  developed  over  providers'  constant  cries  that 
more  was  always  better,  an  inflexible  stance  that  the  opposite  is  always  true  will  serve 
patients  no  better. 

7.  There  must  be  flexibility  in  the  use  of  profiling  information.  This  is  the  "cookbook 
medicine"  argument,  but  it  has  more  than  a  germ  of  truth:  no  standard,  guideline,  or 
protocol  will  apply  in  all  cases.  There  must  be  some  wiggle  room  in  the  decisions  that  are 
made  on  the  basis  of  profiling  information.  David  Eddy,  M.D.  has  said  that  "the  instinct 


137 


to  modify  guidelines  is  extremely  strong,  right  up  there  with  food,  sex,  and  territory."  But 
over  time,  guidelines  should  be  modified;  the  natural  history  of  acquired  immune 
deficiency  syndrome  and  its  treatment  over  the  past  decade  is  evidence  of  that  truth.  In 
profiling,  nothing  can  be  engraved  in  stone. 

8.  The  effects  of  dissemination  efforts  must  be  monitored.  In  the  health  sciences, 
researchers  have  a  distressing  tendency  to  come  up  with  an  idea,  generate  some  data,  or 
formulate  a  theorem  and  and  then  rush  off  to  implement  it.  Health  policy  is  rife  with 
*  examples  of  such  actions,  from  the  enactment  of  Medicaid  to  implementation  of  DRGs. 
Yet  only  rarely  does  the  research  community  seek  to  learn  what  happened  as  a  result.  For 
example,  conventional  wisdom  was  that  the  publication  of  new  information,  and  the 
promulgation  of  voluntary  guidelines,  was  enough  to  change  physician  practice.  Sometimes 
this  is  true;  but  we  now  know  from  the  work  of  Jonathan  Lomas,  M.D.,  Robert  Keller, 
M.D.,  and  others  that  more  is  often  necessary  -  peer  discussion,  peer  pressure,  recurring 
feedback,  even  powerful  financial  incentives  or  regulatory  or  statutory  changes  (Lomas  et 
al.  1989). 

As  nursing  educator  Leah  Curtin  has  said  about  biostatistics,  "Data  always  represent  the 
past;  never  the  future."  That  should  make  monitoring  of  the  generation  and  use  of 
profiling  data  a  mandate.  We  need  to  continue  to  learn  what  works  and  what  does  not  in 
the  dissemination  of  data  as  much  as  in  the  collection  and  analysis  of  data. 

A  deeply  troubling  example  will  suffice.  There  is  one  clinical  intervention  for  which 
cost-benefit  analyses  have  been  completed,  outcomes  information  collected,  provider 
practice  patterns  defined,  efficacy  demonstrated,  and  guidelines  promulgated.  We  know 
that  it  works,  is  inexpensive,  and  is  a  thousand  times  preferable  to  all  alternatives. 
Providers  are  aware  of  this  information  and  have  been  for  a  long  time.  Yet  the  rates  of 
use  of  this  treatment  option  have  been  dropping  for  years.  What  is  it?  Immunization  of 
children  against  measles  and  other  childhood  plagues.  In  some  communities,  fewer  than 
50  percent  of  children  have  been  properly  protected. 

A  more  powerful  example  of  failure  to  implement  outcomes  information,  or  to  change 
practice  patterns,  does  not  exist.  Yet  the  health  services  research  community  has  evinced 
virtually  no  interest  in  the  problem,  policymakers  wring  their  hands  but  do  little,  and  half 
of  all  commercial  insurers  still  refuse  to  cover  immunization. 

If  we  cannot  accomplish  behavior  change  when  we  have  a  clear  set  of  data,  costs  are  low, 
and  outcomes  are  known,  we  must  wonder  about  our  ability  to  convince  the  public  to  use 
far  less  clear-cut  profiling  information  in  far  fuzzier  situations.  A  high  research  priority 
should  be  to  learn  why  immunization  has  been  such  a  spectacular  failure,  and  what  can  be 
done  to  avoid  another  such  debacle—as  well  as  how  to  right  the  wrong  we  have  done  our 
children. 


138 


Lest  anything  in  this  essay  be  misinterpreted,  it  is  obvious  that  the  development  and  use 
of  profiling  data  promises  to  put  medicine  on  a  more  scientific  and  responsible  footing. 
That  promise,  however,  is  accompanied  by  the  threat  that  important  and  delicate  aspects 
of  health  care  could  be  destroyed  in  the  process.  Equally  important,  the  potential  of  new 
ways  of  evaluating  the  quality  of  care  could  be  betrayed  through  the  arrogance  of 
researchers  and  regulators,  the  fears  of  providers,  the  selfishness  of  payers,  and  the 
innocence  and  stubbornness  of  patients.  If  the  use  of  profiling  data  is  to  serve  all  these 
interests,  it  must  be  an  exercise  in  partnership  among  them. 


REFERENCES 

Airman,  L.K.,  "Surgical  Scorecards:  Can  Doctors  Be  Rated  Just  Like  Ballplayers?"  The 
New  York  Times,  January  14,  1992,  p.  C3. 

Friedman,  E.,  "How  Much  Do  Thrombolytics  Really  Cost?"  Health  Business  4(32):1T-2T, 
1989a. 

Friedman,  E.,  "Of  Policy  and  a  Procedure:  How  the  C-Section  Became  a  Consumer 
Issue,"  Health  Business  5(9):1T-4T,  1990. 

Friedman,  E.,  Playing  Doctor:  Who  Will  Control  Medical  Practice  in  the  Year  2000?  (New 
York:  United  Hospital  Fund,  July  1989b). 

Friedman,  E.,  "The  Uninsured:  From  Dilemma  to  Crisis ,"  Journal  of  the  American  Medical 
Association  265:2491-2495,  1992. 

Hadley,  J.,  E.P.  Steinberg,  and  J.  Feder,  "Comparison  of  Uninsured  and  Privately  Insured 
Hospital  Patients:  Condition  on  Admission,  Resource  Use,  and  Outcome,"  Journal 
of  the  American  Medical  Association  265:374-379,  1991. 

KPMG  Peat  Marwick,  Health  Benefits  in  1991:  Executive  Summary  (Newark:  KPMG  Peat 
Marwick,  November  1991). 

Lasker,  R.D.,  D.W.  Shapiro,  and  A.T.  Tucker,  "Realizing  the  Potential  of  Profiling,"  in 
Physician  Payment  Review  Commission,  Conference  on  Profiling,  No.  92-2 
(Washington,  DC:  PPRC,  May  1992). 

Lomas,  J.,  G.M.  Anderson,  K.  Domminick-Pierre,  et  al.,  "Do  Practice  Guidelines  Guide 
Practice?"  The  New  England  Journal  of  Medicine  321:1306-1311,  1989. 

Physician  Payment  Review  Comm\ss\on,Annual  Report  to  Congress  1992  (Washington,  DC: 
PPRC,  1992). 

"Physician-Specific  Data  Released,"  NAHDO  News  4(2):  1,  1992. 


139 


Wennberg,  J.,  "Outcomes  Research,  Cost  Containment,  and  the  Fear  of  Health  Care 
Rationing,"  The  New  England  Journal  of  Medicine  323:1202-1204,  1990. 

Wennberg,  J.,  "Which  Rate  is  Right?"  The  New  England  Journal  of  Medicine  314:310-311, 
1986. 

Young,  G.  and  B.  Cohen,  "Inequities  in  Hospital  Care,  the  Massachusetts  Experience," 
Inquiry  28(3):255-262,  Fall  1991. 


140 


1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 


1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 
1992  PPRC 


1000  DDT)  ( 

CflS  LIBRARY 

3  A0T5  DDDLEBBO  E 

1992  PPR< 
1992  PPR( 
1992  PPR^ 
1992  PPR< 
1992  PPR( 
1992  PPR< 
1992  PPR< 
1992  PPR» 
1992  PPR< 
1992  PPR 


